Building waste detection method and system based on yolov11 and wavelet convolution

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By embedding wavelet convolution and CBAM attention modules into the YOLOv11 network, the high-frequency detail information processing and cross-scale feature interaction capabilities of the construction waste detection model are enhanced, solving the problems of scale difference and background confusion, and achieving higher detection accuracy and feature representation capabilities.

CN122289656APending Publication Date: 2026-06-26HEFEI UNIV OF TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HEFEI UNIV OF TECH
Filing Date: 2026-04-01
Publication Date: 2026-06-26

Application Information

Patent Timeline

01 Apr 2026

Application

26 Jun 2026

Publication

CN122289656A

IPC: G06V10/25; G06V10/82; G06V10/44; G06V10/52; G06V10/80; G06V10/764; G06N3/0464; G06N3/045

AI Tagging

Technology Topics

Multi resolution analysisComputer graphics (images)

Technical Efficacy Phrases

easy to handle improve features

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing YOLO algorithms face problems such as large scale differences, background obfuscation, and loss of high-frequency details in construction waste detection, resulting in insufficient detection accuracy.

Method used

A wavelet convolution module is embedded in the Backbone layer of the YOLOv11 network, and three wavelet convolution modules and two CBAM attention modules are embedded in the Neck layer to construct an object detection model to enhance high-frequency detail information and cross-scale feature interaction capabilities. Multi-scale features are captured through wavelet transform and attention weights are dynamically learned for detection.

Benefits of technology

It improves the accuracy and feature representation capabilities of construction waste detection, enabling more precise identification of construction waste in multi-scale and complex scenarios, thus enhancing the comprehensiveness and accuracy of detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122289656A_ABST

Patent Text Reader

Abstract

This application provides a construction waste detection method and system based on YOLOv11 and wavelet convolution, relating to the field of computer vision. The method includes: constructing a target detection model; inputting acquired construction waste images into the Backbone layer to extract multi-scale features and generate a first feature map; inputting the first feature map into the Neck layer, fusing first feature maps of different depths to capture and enhance feature information at different frequencies, dynamically learning channel and spatial attention weights, and generating a second feature map for target detection of construction waste through the Head layer. This application embeds wavelet convolution modules and CBAM attention modules into the Backbone and Neck layers of YOLOv11 to enhance high-frequency details and cross-scale feature interaction capabilities. Through multi-resolution analysis and parallel detection heads, it can improve the detection accuracy of construction waste in complex scenes.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision technology, specifically to a construction waste detection method and system based on YOLOv11 and wavelet convolution. Background Technology

[0002] With the acceleration of urbanization, the detection and classification of construction waste has become an important task in urban environmental management. As a byproduct of urbanization, the indiscriminate dumping of construction waste not only occupies valuable land resources but also seriously damages the city's appearance and ecological environment. Therefore, achieving efficient and accurate detection of construction waste has become an urgent task in smart city management and environmental monitoring.

[0003] Early research relied primarily on manual on-site inspections or detection methods based on traditional image processing techniques for identifying construction waste. Traditional methods typically rely on manual features such as color, texture, or shape for identification. However, these methods often suffer from limitations in feature robustness when faced with complex and varied construction scenes, struggling to cope with changes in lighting, object occlusion, and background interference, resulting in low detection efficiency and a high false positive rate. In contrast, deep learning-based object detection algorithms, with their powerful feature extraction capabilities, have gradually become mainstream. Currently, object detection algorithms are mainly divided into two categories: one is the two-stage algorithm represented by Faster R-CNN, which offers high accuracy but slow inference speed; the other is the single-stage algorithm represented by the YOLO series, which achieves a good balance between detection speed and accuracy, making it more suitable for engineering scenarios requiring real-time response.

[0004] While the YOLO series of algorithms performs excellently in general object detection tasks, they still face many challenges when directly applied to construction waste detection. First, construction waste seen from an aerial perspective typically varies greatly in scale and is densely distributed. Second, the texture features of the waste are easily confused with complex backgrounds such as bare ground and vegetation. Furthermore, standard convolutional neural networks often lose high-frequency detail information during downsampling, limiting the model's ability to capture boundaries and subtle features. Despite the rapid development of computer vision technology, only a few studies have yet been applied to construction waste detection. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this application provides a construction waste detection method and system based on YOLOv11 and wavelet convolution, which solves the problem that traditional construction waste detection methods have obvious defects, and that the YOLO series algorithms used are affected by scale differences, background confusion and loss of high-frequency details, resulting in insufficient detection accuracy.

[0006] To achieve the above objectives, this application provides the following technical solution: In a first aspect, embodiments of this application provide a construction waste detection method based on YOLOv11 and wavelet convolution. This construction waste detection method includes: embedding a wavelet convolution module into the Backbone layer of an acquired YOLOv11 network, and embedding three wavelet convolution modules and two CBAM attention modules into the Neck layer of the YOLOv11 network to construct a target detection model to enhance the ability to process high-frequency detail information and cross-scale feature interactions; wherein each of the two CBAM attention modules is adjacent to a wavelet convolution module; inputting the acquired construction waste image into the Backbone layer of the target detection model, and using the multi-resolution analysis characteristics of wavelet transform through the wavelet convolution module to capture image super-resolution data. High-frequency detail information with a preset frequency threshold is extracted, multi-scale features are extracted, and a first feature map is generated. The first feature map is input into the Neck layer of the object detection model, and first feature maps of different depths are fused. Before each scale output along the path from high-level features to low-level features and along the path from low-level features to high-level features, wavelet convolution modules and CBAM modules embedded in the Neck layer are used to capture and enhance feature information of different frequencies respectively, dynamically learn channel and spatial attention weights, and generate a second feature map. The second feature map is input into the Head layer of the object detection model, and the category and bounding box of the target are predicted based on the second feature map. Based on three parallel detection heads, target detection of construction waste is performed on feature maps of three different scales.

[0007] Secondly, embodiments of this application provide a construction waste detection system based on YOLOv11 and wavelet convolution, the construction waste detection system comprising: a model building module, a first feature processing module, a second feature processing module, and a target detection module.

[0008] Specifically, the model building module is used to embed wavelet convolutional modules into the Backbone layer of the acquired YOLOv11 network, and to embed three wavelet convolutional modules and two CBAM attention modules into the Neck layer of the YOLOv11 network, to build an object detection model to enhance the ability to process high-frequency detail information and cross-scale feature interactions; wherein, each of the two CBAM attention modules is adjacent to a wavelet convolutional module; the first feature processing module is used to input the acquired construction waste image into the Backbone layer of the object detection model, and through the wavelet convolutional module, to utilize the multi-resolution analysis characteristics of wavelet transform to capture high-frequency detail information of the image exceeding a preset frequency threshold, extract multi-scale features, and generate The first feature map; the second feature processing module is used to input the first feature map into the Neck layer of the object detection model, fuse the first feature maps of different depths, and capture and enhance feature information of different frequencies through wavelet convolution modules and CBAM modules embedded in the Neck layer before outputting at each scale along the path from high-level features to low-level features and from low-level features to high-level features, dynamically learn channel and spatial attention weights, and generate the second feature map; the object detection module is used to input the second feature map into the Head layer of the object detection model, predict the category and bounding box of the target based on the second feature map, and perform target detection of construction waste on feature maps of three different scales based on three parallel detection heads.

[0009] Thirdly, embodiments of this application provide an electronic device, which includes: a processor, a memory, and a program stored in the memory and executable on the processor. When the program is executed by the processor, it implements the construction waste detection method based on YOLOv11 and wavelet convolution described in the first aspect above.

[0010] Fourthly, embodiments of this application provide a computer-readable storage medium storing a program or instructions that, when executed by a processor, implement the construction waste detection method based on YOLOv11 and wavelet convolution described in the first aspect above.

[0011] This application provides a method and system for detecting construction waste based on YOLOv11 and wavelet convolution. Compared with existing technologies, it has the following advantages: This application enhances the model's ability to process high-frequency details and interact across scales by embedding wavelet convolution modules in the Backbone layer of the YOLOv11 network and three wavelet convolution modules and two adjacent CBAM attention modules in the Neck layer. By leveraging the multi-resolution analysis characteristics of wavelet transform, it can effectively capture high-frequency details and extract multi-scale features from images. At the same time, it further captures and enhances different frequency feature information in the paths from high-level features to low-level features and from low-level features to high-level features. It dynamically learns channel and spatial attention weights, and then completes detection on feature maps of different scales through three parallel detection heads, thereby improving the detection accuracy and feature representation ability of construction waste in multi-scale and complex scenes. Attached Figure Description

[0012] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 This is a schematic flowchart of a construction waste detection method based on YOLOv11 and wavelet convolution provided in an embodiment of this application; Figure 2 This is a schematic diagram of the structure of a target detection model provided in an embodiment of this application; Figure 3 This is a schematic diagram of the confusion matrix of a target detection model provided in this application embodiment on a construction waste detection task; Figure 4 This is a schematic diagram of the precision-recall curve of a target detection model provided in this application embodiment on a construction waste detection task; Figure 5-1 This is a schematic diagram of the visual detection results of the target detection model provided in this application embodiment in a scene containing complex construction waste; Figure 5-2 This is a schematic diagram of the visual detection results of YOLOv11 in a scene containing complex construction waste, provided in an embodiment of this application. Figure 5-3 This is a schematic diagram of the visual detection results of YOLOv12 in a scene containing complex construction waste, provided in an embodiment of this application. Figure 6 This is a schematic diagram of a construction waste detection system based on YOLOv11 and wavelet convolution provided in an embodiment of this application; Figure 7This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are described clearly and completely. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0014] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.

[0015] This application provides a construction waste detection method and system based on YOLOv11 and wavelet convolution, which solves the problem that traditional construction waste detection methods have obvious defects, and that the YOLO series algorithms used are affected by scale differences, background confusion and loss of high-frequency details, resulting in insufficient detection accuracy.

[0016] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.

[0017] The following section first introduces a construction waste detection method based on YOLOv11 and wavelet convolution provided in the embodiments of this application.

[0018] This application provides a flowchart illustrating a construction waste detection method based on YOLOv11 and wavelet convolution, as shown in the embodiments below. Figure 1 As shown, the construction waste detection method based on YOLOv11 and wavelet convolution may include the following steps S110-S140.

[0019] S110. Embed a wavelet convolutional module into the Backbone layer of the acquired YOLOv11 network, and embed three wavelet convolutional modules and two CBAM attention modules into the Neck layer of the YOLOv11 network to construct an object detection model to enhance the ability to process high-frequency detail information and cross-scale feature interaction; wherein, each of the two CBAM attention modules is adjacent to a wavelet convolutional module.

[0020] S120. Input the collected construction waste image into the Backbone layer of the target detection model. Use the wavelet convolution module to capture high-frequency detail information of the image that exceeds the preset frequency threshold, extract multi-scale features and generate the first feature map.

[0021] S130. Input the first feature map into the Neck layer of the object detection model, fuse the first feature maps of different depths, and before each scale output of the path from high-level features to low-level features and the path from low-level features to high-level features, capture and enhance feature information of different frequencies through the wavelet convolution module and CBAM module embedded in the Neck layer respectively, dynamically learn the attention weights of channels and space, and generate the second feature map.

[0022] S140. Input the second feature map into the Head layer of the target detection model, predict the target category and bounding box based on the second feature map, and perform target detection of construction waste on feature maps at three different scales based on three parallel detection heads.

[0023] The above is a specific implementation of a construction waste detection method based on YOLOv11 and wavelet convolution provided in this application. It can be understood that this application constructs a target detection model by embedding a wavelet convolution module into the Backbone layer of the YOLOv11 network, and embedding three wavelet convolution modules and two CBAM attention modules, each adjacent to a wavelet convolution module, into the Neck layer. This ensures that the model has the ability to process high-frequency detailed information and perform cross-scale feature interaction, providing a structural foundation for accurately capturing construction waste features and improving detection performance. This application inputs the collected construction waste images into the Backbone layer. Leveraging the wavelet transform multi-resolution analysis capabilities of the wavelet convolution module, it can accurately capture high-frequency detail information in the images. Simultaneously, it effectively extracts multi-scale features and generates a first feature map, solving the problem of traditional models struggling to capture subtle features of construction waste. After the first feature map is input into the Neck layer, based on the fusion of first feature maps at different depths, the wavelet convolution module and CBAM module embedded in the Neck layer respectively capture and enhance feature information of different frequencies. Furthermore, it can dynamically learn channel and spatial attention weights and generate a second feature map, further strengthening the specificity and effectiveness of feature representation and improving the model's recognition performance for construction waste features of different scales and frequencies. The Head layer can perform construction waste target detection on three different scale feature maps using three parallel detection heads, ensuring that the model can adapt to construction waste of different scales and improving the comprehensiveness and accuracy of detection.

[0024] In some embodiments, please refer to Figure 2 In the object detection model, the backbone layer includes, from top to bottom, a first convolutional module, a second convolutional module, a first C3k2 module, a wavelet convolutional module, a third convolutional module, a second C3k2 module, a fourth convolutional module, a third C3k2 module, a fifth convolutional module, a fourth C3k2 module, a fast spatial pyramid pooling (SPPF) module, and a parallel spatial attention convolution (C2PSA) module, forming a continuous feature downsampling and extraction path.

[0025] In the embodiments of this application, please refer to Figure 2WaveletConv is the wavelet convolution module, Conv represents the convolution module, and C3k2 represents the core CSP bottleneck module in YOLOv11 used to replace C2f. It is a key component in the Backbone and Neck responsible for high-frequency detail extraction. In C3k2, C3 represents the CSP double-branch structure with three convolution points, k represents the convolution kernel used inside the bottleneck structure Bottleneck, which is a 3×3 small convolution kernel here, and 2 represents that there are two stacked 3×3 convolutions inside the bottleneck structure Bottleneck, with an equivalent receptive field of 5×5, which focuses on capturing local details. Among them, the bottleneck structure Bottleneck is the small unit that constitutes C3k2.

[0026] Understandably, this backbone layer forms a continuous feature downsampling and extraction path through a series of cascaded convolutional modules, multiple C3k2 modules, wavelet convolutional modules, fast spatial pyramid pooling (SPPF) modules, and parallel spatial attention convolutional (C2PSA) modules. The alternating cascaded convolutional and C3k2 modules can progressively complete feature downsampling and multi-scale feature extraction. Each C3k2 module can effectively capture high-frequency details to enhance the expression of subtle features of construction waste. The embedded wavelet convolutional module can accurately capture high-frequency details by utilizing the multi-resolution analysis characteristics of wavelet transform, making up for the shortcomings of standard convolutional downsampling, which is prone to losing subtle features. The fast spatial pyramid pooling (SPPF) module can expand the receptive field and fuse local and global features to enhance the perception of construction waste at different scales. The parallel spatial attention convolutional (C2PSA) module can enhance target features and suppress background interference through the parallel spatial attention mechanism, thereby providing high-quality and robust basic features for subsequent feature fusion and detection output, ultimately improving the accuracy and reliability of construction waste detection.

[0027] In some embodiments, the wavelet convolution module uses discrete wavelet transform to process image signals; the discrete wavelet transform decomposes the image signal into multiple sub-signals of different frequencies; the multiple sub-signals are divided into low-frequency approximate signals and high-frequency detail signals based on a preset frequency threshold. The low-frequency approximate signals are used to characterize the contour and overall structure of the image, while the high-frequency detail signals contain image edge information and image texture information.

[0028] It should be noted that this application is not merely a simple module stacking improvement of the YOLOv11 model, but rather proposes a new paradigm for object detection that deeply integrates frequency domain analysis and spatial domain feature extraction. Existing pure spatial domain convolutional methods, such as standard convolutional neural networks, typically process image information indirectly. High-frequency detail information (such as the irregular contours and fine textures of construction waste) may be processed together with low-frequency background information. As the number of network layers increases and repeated downsampling occurs, high-frequency details may inevitably become blurred and lost.

[0029] This application proposes a novel paradigm that achieves a decomposition-then-learning approach by embedding a wavelet convolution module within the network. The wavelet convolution module precisely decomposes the image signal into a low-frequency approximation representing the overall contour and a high-frequency detail signal representing edges, textures, and other details. When the wavelet convolution module is embedded in the network, the feature map is no longer a chaotic whole during transmission but is explicitly decomposed into sub-feature maps of multiple frequency bands. The network can directly see and learn these separated high-frequency details, rather than tediously fitting from mixed information. This novel paradigm enhances details directly at the feature level without relying on image preprocessing. This application integrates wavelet transform as a differentiable, end-to-end trainable layer (the wavelet convolution module) within the network, allowing the network to autonomously learn during training how to most effectively utilize these separated high-frequency and low-frequency information based on the final detection task loss, performing dynamic, adaptive, end-to-end detail enhancement at the feature level.

[0030] In some embodiments, please refer to Figure 2 In the target detection model, the Neck layer includes a first uplink path, a second uplink path, a first downlink path, a second downlink path, and a third downlink path; In the first uplink path, the output of the parallel spatial attention convolution C2PSA module is first upsampled and then concatenated with the output of the third C3k2 module. In the second uplink path, the output of the first splicing and fusion is sequentially passed through the fifth C3k2 module and the second upsampling module, and then spliced and fused with the output of the second C3k2 module in the second splicing and fusion. In the first downlink path, the output of the second splicing and fusion passes sequentially through the sixth C3k2 module, the wavelet convolution module, and a CBAM attention module before being output to the first detection head of the Head layer; In the second downlink path, the output of a CBAM attention module is concatenated and fused with the output of the fifth C3k2 module after passing through the sixth convolution module. The concatenated and fused output is then passed through the seventh C3k2 module, the wavelet convolution module, and another CBAM attention module in sequence before being output to the second detection head of the Head layer. In the third downlink path, the output of another CBAM attention module is concatenated and fused with the output of the parallel spatial attention convolution C2PSA module after passing through the seventh convolution module. The concatenated and fused output is then passed through the eighth C3k2 module and the wavelet convolution module in sequence before being output to the third detection head of the Head layer. In the embodiments of this application, it can be understood that the Neck layer constructs a multi-directional feature interaction structure including a first uplink path, a second uplink path, a first downlink path, a second downlink path, and a third downlink path. In the first uplink path and the second uplink path, the output of the parallel spatial attention convolution C2PSA module is sequentially processed through the aforementioned first concatenation fusion and second concatenation fusion, thereby realizing the cross-layer transmission and fusion of high-level semantic features and low-level detailed features, and enhancing the interaction capability of multi-scale features.

[0031] Furthermore, in the first, second, and third downlink paths, by processing the second concatenated and fused features through different C3k2 modules, wavelet convolution modules, and CBAM attention modules, feature information of different frequencies can be captured. The channel and spatial attention weights are dynamically learned to enhance target features and suppress background interference. At the same time, the processed multi-scale features are output to the three detection heads in the Head layer, realizing accurate adaptation and comprehensive detection of construction waste at different scales. This provides a reliable feature fusion and output guarantee for improving the overall detection accuracy and robustness of the model.

[0032] In one example, the CBAM attention module is an attention mechanism module used to enhance the performance of convolutional neural networks. The CBAM attention module includes a channel attention module and a spatial attention module. By introducing channel attention and spatial attention into the convolutional neural network (CNN), the perceptual ability of the model is improved, thereby improving performance without increasing network complexity. Channel attention is used to enhance the feature representation of different channels, and spatial attention is used to extract information from different locations in space. The channel attention module and the spatial attention module can be embedded in different layers of the CNN to enhance feature representation.

[0033] It is important to emphasize that this application employs a targeted module placement strategy in its network structure. For example, a wavelet convolution module (WaveletConv) is embedded in the early stages of the backbone to preemptively preserve high-frequency details and prevent their loss in deeper layers. Simultaneously, wavelet convolution modules (WaveletConv) and CBAM attention modules are inserted before each key-scale output of the neck to achieve multi-scale, adaptive detail enhancement and fusion. This ensures that detailed information is preserved and enhanced at each stage of feature fusion, enabling the model to adaptively focus on the most important details for targets at different scales.

[0034] Specifically, in convolutional neural networks, the resolution of feature maps decreases continuously as the network deepens through downsampling operations (e.g., convolutions with a stride of 2). This process may be the most severe stage for the loss of high-frequency detail information; once information is lost during downsampling, subsequent network layers may be unable to recover it. Therefore, this application intervenes as early as possible before the feature maps lose a large amount of detail information due to multiple downsampling operations. By embedding the wavelet convolution module WaveletConv in the early stages of the backbone (i.e., when the feature map resolution is still relatively high), high-frequency details (edges, textures) in the image can be decomposed through wavelet transform and preserved in specific high-frequency sub-band channels in a timely manner. In this way, even if the spatial resolution of the feature maps decreases later, the necessary key detail information has already been encoded into the feature channels and can continue to be transmitted in the network, providing the original information for subsequent recognition.

[0035] The core task of Neck is to fuse feature maps from different depths in the backbone to balance high-level semantic information and low-level spatial details, thereby enabling the detection of targets of different sizes. In Neck, the wavelet convolution module WaveletConv is inserted to preserve and enhance detail information at each stage of the fusion process. When deep high-level semantic features are fused with shallow detail features, without special processing, the detail information can easily be diluted by the strong semantic information. This application inserts the wavelet convolution module WaveletConv before the fusion node at each key scale, which performs frequency domain decomposition on the fused features again, ensuring that high-frequency detail information remains prominent and clear when generating the predicted feature map at that scale. This is crucial for detecting waste of corresponding sizes at that scale. The insertion of the CBAM attention module in Neck makes detail enhancement more targeted. The key features of waste of different sizes may behave differently at different scales. For example, for large pieces of waste, the internal texture may be more important; for small pieces of waste, the contour edges may be more critical. Adding a CBAM attention module before the outputs at each scale of the Neck allows the model to learn attention weights for different scales; it can focus more on texture regions on large-scale feature maps and more on edge positions on small-scale feature maps. This strategy may ensure that at each stage of multi-scale feature fusion, the model can adaptively focus on the most important details at the current scale, thus potentially achieving accurate detection of waste of various sizes from large to small.

[0036] In some embodiments, the wavelet convolution module is used for: S210. Perform a two-dimensional first-order Haar wavelet transform, achieved by combining operations in the width and height spatial dimensions of the input image. Four 2×2 filters are used as four convolution kernels, which form a set of orthonormal bases. Each of the four convolution kernels is a low-pass filter. High-pass filter High-pass filter and high-pass filter .

[0037] For example, a low-pass filter High-pass filter High-pass filter and high-pass filter They satisfy the following expressions respectively: ; S220. Perform a depthwise convolution operation with a stride of 2 on the input image. Convolve the input image using four convolution kernels respectively, and output feature maps for four channels: ;in, It is the input image. This indicates convolution, where the resolution of the feature maps for all four channels is half the resolution of the input image. It refers to the low-frequency components of the input image that are below a preset frequency threshold. It refers to the high-frequency components of the input image that exceed a preset frequency threshold. It refers to the vertical high-frequency components of the input image that exceed a preset frequency threshold. It is the diagonal high-frequency component of the input image that exceeds a preset frequency threshold.

[0038] S230. Perform inverse wavelet transform by transposing convolution, using four convolution kernels to perform transposing convolution operation to reconstruct the feature map, and achieve cascaded wavelet decomposition by recursively decomposing low-frequency components.

[0039] In the embodiments of this application, it can be understood that this application performs a two-dimensional first-order Haar wavelet transform and uses four 2×2 filters to form a set of orthonormal bases, which are then used as low-pass filters. With high-pass filter 、、 This provides a standardized orthogonal basis for subsequent high-frequency detail capture and feature decomposition. This application performs a depthwise convolution operation with a stride of 2 on the input image, using four convolution kernels to obtain feature maps for four channels. While reducing the feature map resolution to half that of the input image, it accurately separates low-frequency components and three types of high-frequency components (horizontal, vertical, and diagonal), effectively capturing high-frequency detail information exceeding a preset frequency threshold and compensating for the shortcomings of standard convolution downsampling, which easily loses subtle features.

[0040] Furthermore, this application reconstructs the feature map by performing inverse wavelet transform through transposed convolution, and achieves cascaded wavelet decomposition by recursively decomposing low-frequency components. This not only ensures the recoverability and integrity of the features, but also enhances the ability to extract features of different frequencies through multi-scale decomposition. This provides richer and more discriminative feature representations for subsequent feature fusion and detection, thereby improving the model's ability to capture subtle features of construction waste and its detection accuracy.

[0041] In some embodiments, the wavelet convolution module incorporates an adjustment unit, which is used for: S310. For each local feature region to be decomposed in the wavelet convolution module, analyze the numerical fluctuation within the region, calculate the degree of local numerical change, and determine whether the current region is affected by occasional data deviations in the storage unit, and whether there are jitter signals or fuzzy signals caused by noise.

[0042] S320. Set a sliding window for each input feature map. For each pixel within the sliding window, calculate the numerical dispersion of the surrounding local region to obtain the level of local numerical fluctuation. For example, for a 3x3 or 5x5 pixel area, for each pixel within the window, calculate the numerical dispersion of the surrounding local region, which can be measured using local variance. The larger the local variance, the more drastic the numerical fluctuation in that region, and the more likely it is to contain small data biases.

[0043] S330. Compare the local numerical fluctuation level with the preset fluctuation threshold and perform adaptive parameter adjustment to generate a high-frequency sub-feature map that has been purified and exceeds the preset frequency threshold, so as to characterize the edges and textures of construction waste in the aerial image and avoid artifacts or blurring caused by data deviation.

[0044] In this embodiment, it should be noted that the construction waste images are collected by drones. When the target detection model is deployed on the drone's onboard computing unit, in actual operation, the drone needs to fly continuously for long periods of time at high intensity, especially during the high temperatures of summer. This results in a high load on the cooling system of the onboard computing unit, and the GPU core temperature approaches its design limit. This continuous high-temperature operation causes occasional bit flips or data read errors in the storage units around the GPU core (such as GDDR memory), introducing small, instantaneous numerical deviations. When the wavelet convolution module loads these feature maps with small deviations for wavelet decomposition, especially the high-frequency components, their values are no longer completely accurate, carrying slight jitter or blurry information, weakening the ability to capture details. These errors further affect the spatial accuracy of feature fusion in the Neck part, causing slight misalignment or blurry details in the fused feature map. Consequently, the CBAM attention module has difficulty accurately identifying real edges and textures, and may misallocate attention, weakening the focusing ability. Ultimately, the model cannot fully utilize high-frequency details when predicting in the Head part, resulting in a significant decrease in the detection accuracy and recall rate for construction waste with blurred edges, small scale, and dense distribution.

[0045] Based on this, this application provides an implementation scheme that introduces an adjustment unit into the wavelet convolution module. The adjustment unit can achieve feature purification and adaptive optimization through steps S310-S330. First, the numerical fluctuation of each local feature region is analyzed to accurately identify occasional data deviations and jitter or blurry signals caused by noise. Then, the local numerical dispersion is calculated through a sliding window to accurately quantify the local numerical fluctuation level. After obtaining the local numerical fluctuation level, the purified high-frequency sub-feature map is generated by comparing it with a preset fluctuation threshold and adjusting the parameters adaptively. This effectively represents the edges and textures of construction waste, avoids artifacts or blurring caused by data deviations, significantly improves the accuracy and robustness of feature expression, and provides high-quality feature support for subsequent high-precision detection.

[0046] In some embodiments, the aforementioned adaptive parameter adjustment includes: for local regions where the local numerical fluctuation level exceeds a preset fluctuation threshold, setting a detail retention threshold for the generation of high-frequency subbands to suppress irregular jitter or fuzzy signals caused by data deviation; for local regions where the local numerical fluctuation level does not exceed the preset fluctuation threshold, retaining the original decomposition strategy so that all real detail information is fully extracted.

[0047] In the embodiments of this application, it is understood that the adaptive parameter adjustment achieves precise purification and maximum retention of features through a differentiated strategy. For areas where the local numerical fluctuation level exceeds the preset fluctuation threshold, this application explicitly sets a detail retention threshold to effectively suppress irregular jitter or blurred signals caused by data deviation and remove noise interference. For areas that do not exceed the preset fluctuation threshold, the original decomposition strategy is maintained to ensure that real detail information is not mistakenly deleted and is completely extracted. Thus, while purifying features, key high-frequency features such as the edges and textures of construction waste are retained to the maximum extent, improving the accuracy and completeness of feature expression and providing high-quality and robust feature support for subsequent detection.

[0048] It's important to note that the adjustment unit acts like a smart filter for the wavelet convolution module. Before the wavelet convolution module begins decomposing image details, the adjustment unit quickly scans the image region to be processed, calculating the degree of numerical variation within this small region. If it detects abnormally large numerical fluctuations in a certain region, it determines that this may be due to data jitter caused by high hardware temperatures. Once this anomaly is detected, the adjustment unit becomes more selective, setting a stricter standard for extracting high-frequency details. Only very obvious and coherent edges and textures are considered true details and preserved. Weak, irregular signals, which are likely caused by data jitter, are gently filtered out. If the data quality of the image region is good and there is no obvious jitter, the original high-precision decomposition method is maintained, ensuring that all true details are fully extracted. This approach directly addresses the issue of occasional data deviations in storage units when drones operate at high temperatures. Instead of simply processing all images uniformly, this application adjusts its strategy based on the data conditions of each local area. This is crucial for aerial images containing small, densely distributed construction debris with blurred edges, as these small targets have limited information, and slight data fluctuations can lead to model misjudgments. Through this adaptive detail cleansing, the model can more accurately identify these difficult-to-distinguish targets. In contrast, conventional methods lacking this adaptive capability, such as using a fixed filter for noise reduction, might filter out some genuine, minute details in areas with good data quality, while failing to effectively suppress spurious signals caused by data fluctuations in areas with poor data quality.

[0049] In some embodiments, for training and testing the research work presented in this paper, an experiment was conducted using a computer with an operating system of Ubuntu 18.04, 24GB of memory (NVIDIA GeForce RTX 4090 GPU), Python 3.8.18, and Torch-1.9.0. Weights were initialized using random initialization techniques, and all models were trained from scratch. Table 1 shows the hyperparameter settings of the network models.

[0050] Table 1 Based on the information in Table 1, the main parameter settings used in this experiment are as follows: the number of training epochs is set to 200, meaning that the total number of times the entire training dataset is traversed is 200; the batch size is set to 16, indicating that the number of training samples input to the model in each iteration is 16; the learning rate is set to 0.01, used to control the update step size of the model parameters during the iterative update process; the weight decay is set to 0.0005, used as a regularization term to suppress overfitting during model training; the optimizer uses the stochastic gradient descent algorithm (SGD) to iteratively update the model parameters; and the input image size is uniformly set to 800 to ensure the consistency of the input data specifications.

[0051] The evaluation metrics used in the test are as follows: accuracy, recall, and mean precision (mAP). For binary classification problems, samples can be divided into four types: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Accuracy measures the relevance of the results, while recall measures how many truly relevant results are returned. This test uses mean precision (mAP) to comprehensively evaluate the model in terms of detected boundaries and classification performance. The most widely used standard is the mean precision (mAP) used in the following test.

[0052] Based on the experimental results, Figure 3 The confusion matrix of the object detection model on the construction waste detection task is shown. From Figure 3 The confusion matrix shows that the model performs well in predicting the "Waste" and "background" categories. The values on the main diagonal of the matrix represent the proportion of instances correctly predicted by the model. "Waste" and "background" are the core classification labels for the detection task. "Waste" corresponds to construction waste and is the target class; "background" corresponds to the background and is a non-target class.

[0053] Specifically, the "Waste" category was correctly predicted 0.67 times, indicating that the model has a strong accuracy in recognizing construction waste. Meanwhile, the "background" category was correctly predicted 1.00 times, meaning the model effectively distinguishes background and rarely misidentifies background as waste. The off-diagonal elements of the matrix represent the degree of confusion between different categories. For example, the proportion of times the true "Waste" was incorrectly predicted as "background" was 0.33, indicating that in rare cases, the model might misclassify some difficult-to-distinguish fragments of waste as background. Overall, however, this level of confusion is low, reflecting the excellent ability of the object detection model to distinguish between construction waste and background, and demonstrating high reliability and accuracy in its predictions.

[0054] Figure 4 This demonstrates the Precision-Recall (PR) curve of the object detection model on the construction waste detection task. From... Figure 4 As can be seen, the PR curve for the "Waste" category is located in the upper right corner of the coordinate system, indicating that the object detection model can achieve a high recall rate while maintaining a high precision rate. In particular, the mean precision (mAP@0.5) for all categories reached 0.636, which shows that the model performs well in detecting construction waste at a fixed IoU threshold of 0.5. IoU (Intersection over Union) is a core metric for measuring the degree of overlap between the model's predicted bounding box and the ground truth bounding box. A fixed IoU threshold is set as a critical value for judging positive and negative samples and determining detection results in object detection. It is a core hyperparameter in the training and inference stages of the YOLO series models and a fundamental criterion for calculating the PR curve and recall / precision ratio.

[0055] The shape of this curve indicates that the model's precision did not significantly decrease while efforts were made to improve recall to detect more waste. This excellent performance is attributed to the capabilities of the YOLOv11 model, wavelet convolution module, and CBAM attention module, which effectively enhanced the model's ability to capture multi-scale features and pay attention to key regions, thereby reducing false negatives and significantly improving the model's detection performance in complex architectural scenes.

[0056] Furthermore, to evaluate the performance and effectiveness of the object detection model, this application conducted comparative experiments with the baseline models YOLOv11 and YOLOv12 on the same test set. Table 2 details the performance metrics of the object detection model, YOLOv11, and YOLOv12 on the test set. The object detection model achieved the best performance of 63.6 on the comprehensive evaluation metric of mAP@0.5, which is higher than YOLOv11's 62.5 and YOLOv12's 61.0. This indicates that the object detection model is more accurate in locating and classifying construction waste while maintaining overall detection accuracy.

[0057] Table 2 To more intuitively demonstrate the performance differences between different models, Figure 5-1 , Figure 5-2 and Figure 5-3 This presentation compares the visual detection results of object detection models YOLOv11 and YOLOv12 in scenes containing complex construction waste. The figures show that while the baseline models YOLOv11 and YOLOv12 can detect most waste in some typical scenarios, they exhibit some false negatives and missed detections when handling densely distributed, small-sized, or highly indistinguishable waste from the background. For example, referring to… Figure 5-1 , Figure 5-2 and Figure 5-3 YOLOv11 and YOLOv12 may fail to recognize partially occluded or similarly colored waste. In contrast, the object detection model demonstrates stronger detection capabilities. It not only accurately identifies waste that the baseline model can detect, but also shows better detection performance for smaller piles of waste and areas with blurred edges. The object detection model can detect more details. Comparative experimental results fully demonstrate the superiority of the object detection model in the construction waste detection task. By combining wavelet convolution and attention mechanisms, the object detection model achieves leading performance in both mAP@0.5 and recall, providing a more reliable solution for intelligent monitoring of construction waste in practical applications.

[0058] This application addresses the challenge of detecting construction waste in complex scenes by proposing an improved object detection model based on the YOLOv11 architecture. This model integrates wavelet convolution modules and CBAM attention modules into the YOLOv11 Backbone and Neck networks. By introducing the wavelet convolution module, the model can more effectively capture high-frequency details in the image, which is crucial for identifying small, irregularly shaped construction waste. Simultaneously, the CBAM attention module dynamically enhances the model's focus on key features, effectively suppressing interference from complex backgrounds, thereby improving the quality and robustness of feature representation.

[0059] Experimental results demonstrate that the object detection model exhibits superior performance in construction waste detection. In comparative experiments on the test set, the object detection model achieved an mAP of 63.6 at 0.5 and a recall of 58.7%, both improvements over the baseline models YOLOv11 and YOLOv12. This fully demonstrates the superiority of the object detection model in terms of detection accuracy and recall. Furthermore, through visual detection performance comparisons, the object detection model demonstrates stronger detection capabilities and more accurate bounding box localization when handling densely distributed, small objects or construction waste with high background confusion, significantly reducing missed detections and false detections.

[0060] In summary, the object detection model successfully addresses the challenges faced by existing methods in detecting construction waste in complex architectural scenarios, providing an efficient and reliable solution. The technical solution presented in this application offers strong technical support and new ideas for practical applications such as smart city management, environmental monitoring, and automated sorting of construction waste. Future work could further explore lightweight model design or validate the model's generalization ability on broader and more diverse datasets.

[0061] In some embodiments, this application provides a construction waste detection system 400 based on YOLOv11 and wavelet convolution, such as Figure 6 As shown, the construction waste detection system 400 based on YOLOv11 and wavelet convolution may include the following modules: The model building module 410 is used to embed a wavelet convolution module into the Backbone layer of the acquired YOLOv11 network, and to embed three wavelet convolution modules and two CBAM attention modules into the Neck layer of the YOLOv11 network to build an object detection model to enhance the ability to process high-frequency detail information and cross-scale feature interaction; wherein, each of the two CBAM attention modules is adjacent to a wavelet convolution module. The first feature processing module 420 is used to input the collected construction waste image into the Backbone layer of the target detection model, and use the wavelet convolution module to capture high-frequency detail information of the image that exceeds the preset frequency threshold by utilizing the multi-resolution analysis characteristics of wavelet transform, extract multi-scale features and generate the first feature map. The second feature processing module 430 is used to input the first feature map into the Neck layer of the object detection model, fuse the first feature maps of different depths, and capture and enhance feature information of different frequencies through wavelet convolution module and CBAM module embedded in the Neck layer before each scale output of the path from high-level features to low-level features and from low-level features to high-level features, dynamically learn channel and spatial attention weights, and generate the second feature map. The target detection module 440 is used to input the second feature map into the head layer of the target detection model, predict the target category and bounding box based on the second feature map, and perform target detection of construction waste on feature maps at three different scales based on three parallel detection heads.

[0062] According to embodiments of this application, any and multiple modules among the model building module 410, the first feature processing module 420, the second feature processing module 430, and the object detection module 440 can be combined into one module, or any one of these modules can be split into multiple modules. Alternatively, at least some of the functions of one or more of these modules can be combined with at least some of the functions of other modules and implemented in one module.

[0063] Figure 6 Each module in the system shown has the function of implementing each step in the aforementioned construction waste detection method based on YOLOv11 and wavelet convolution, and can achieve the corresponding technical effect. For the sake of brevity, it will not be elaborated here.

[0064] In some embodiments, this application provides an electronic device, the structural schematic of which is shown below. Figure 7 As shown.

[0065] The electronic device may include a processor 510 and a memory 520 storing computer program instructions.

[0066] Specifically, the processor 510 may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement the embodiments of this application.

[0067] Memory 520 may include mass storage for data or instructions. For example, and not limitingly, memory 520 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, memory 520 may include removable or non-removable (or fixed) media. Where appropriate, memory 520 may be internal or external to the integrated gateway disaster recovery device. In a particular embodiment, memory 520 is non-volatile solid-state memory.

[0068] Memory 520 may include read-only memory (ROM), random access memory (RAM), disk storage media device, optical storage media device, flash memory device, electrical, optical, or other physical / tangible memory storage device. Therefore, typically, memory 520 includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it can perform the operations described in any of the YOLOv11 and wavelet convolution-based construction waste detection methods in the above embodiments.

[0069] The processor 510 reads and executes computer program instructions stored in the memory 520 to implement any of the construction waste detection methods based on YOLOv11 and wavelet convolution in the above embodiments.

[0070] In one example, the electronic device may also include a communication interface 530 and a bus 500. Wherein, such as Figure 7 As shown, the processor 510, memory 520, and communication interface 530 are connected via bus 500 and communicate with each other.

[0071] The communication interface 530 is mainly used to realize communication between various modules, devices, units and / or equipment in the embodiments of this application. Bus 500 includes hardware, software, or both, that couples components of an online data traffic metering device together. For example, and not limitingly, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Microchannel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or combinations of two or more of these. Where appropriate, bus 500 may include one or more buses. Although specific buses are described and illustrated in embodiments of this application, this application contemplates any suitable bus or interconnect.

[0072] Furthermore, in conjunction with the construction waste detection methods based on YOLOv11 and wavelet convolution in the above embodiments, this application embodiment can provide a computer storage medium for implementation. This computer storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the construction waste detection methods based on YOLOv11 and wavelet convolution in the above embodiments. It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.

[0073] The functional blocks shown in the above block diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.

[0074] It should also be noted that the exemplary embodiments mentioned in this application describe methods or systems based on a series of steps or apparatus. However, this application is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.

[0075] The aspects of this disclosure have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by special-purpose hardware performing the specified functions or actions, or can be implemented by a combination of special-purpose hardware and computer instructions.

[0076] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A construction waste detection method based on YOLOv11 and wavelet convolution, characterized in that, include: A wavelet convolutional module is embedded into the Backbone layer of the acquired YOLOv11 network, and three wavelet convolutional modules and two CBAM attention modules are embedded into the Neck layer of the YOLOv11 network to construct an object detection model to enhance the ability to process high-frequency detail information and cross-scale feature interaction; wherein, each of the two CBAM attention modules is adjacent to a wavelet convolutional module. The collected images of construction waste are input into the Backbone layer of the target detection model. The wavelet convolution module utilizes the multi-resolution analysis characteristics of wavelet transform to capture high-frequency detail information of the image that exceeds the preset frequency threshold, extract multi-scale features, and generate the first feature map. The first feature map is input into the Neck layer of the object detection model. First feature maps of different depths are fused. Before each scale output of the path from high-level features to low-level features and the path from low-level features to high-level features, the wavelet convolution module and CBAM module embedded in the Neck layer capture and enhance feature information of different frequencies respectively, dynamically learn channel and spatial attention weights, and generate a second feature map. The second feature map is input into the head layer of the target detection model. The target category and bounding box are predicted based on the second feature map. Based on three parallel detection heads, target detection of construction waste is performed on feature maps at three different scales.

2. The construction waste detection method based on YOLOv11 and wavelet convolution as described in claim 1, characterized in that, In the target detection model, the backbone layer includes, from top to bottom, a first convolutional module, a second convolutional module, a first C3k2 module, a wavelet convolutional module, a third convolutional module, a second C3k2 module, a fourth convolutional module, a third C3k2 module, a fifth convolutional module, a fourth C3k2 module, a fast spatial pyramid pooling (SPPF) module, and a parallel spatial attention convolution (C2PSA) module, forming a continuous feature downsampling and extraction path; The wavelet convolution module uses discrete wavelet transform to process image signals. The discrete wavelet transform decomposes the image signal into multiple sub-signals of different frequencies. The multiple sub-signals are divided into low-frequency approximation signals and high-frequency detail signals based on the preset frequency threshold. The low-frequency approximation signals are used to characterize the contour and overall structure of the image, and the high-frequency detail signals contain image edge information and image texture information.

3. The construction waste detection method based on YOLOv11 and wavelet convolution as described in claim 2, characterized in that, In the target detection model, the Neck layer includes a first uplink path, a second uplink path, a first downlink path, a second downlink path, and a third downlink path; In the first uplink path, the output of the parallel spatial attention convolution C2PSA module is first upsampled and then merged with the output of the third C3k2 module. In the second uplink path, the output of the first splicing and fusion is sequentially passed through the fifth C3k2 module and the second upsampling module, and then spliced and fused with the output of the second C3k2 module in the second splicing and fusion. In the first downlink path, the output of the second splicing and fusion sequentially passes through the sixth C3k2 module, the wavelet convolution module, and a CBAM attention module before being output to the first detection head of the Head layer; In the second downlink path, the output of the CBAM attention module is concatenated and fused with the output of the fifth C3k2 module after passing through the sixth convolution module. The concatenated and fused output is then sequentially passed through the seventh C3k2 module, the wavelet convolution module, and another CBAM attention module before being output to the second detection head of the Head layer. In the third downlink path, the output of the other CBAM attention module is concatenated and fused with the output of the parallel spatial attention convolution C2PSA module after passing through the seventh convolution module. The concatenated and fused output is then passed through the eighth C3k2 module and the wavelet convolution module in sequence and output to the third detection head of the Head layer.

4. The construction waste detection method based on YOLOv11 and wavelet convolution as described in claim 3, characterized in that, The CBAM attention module is an attention mechanism module used to enhance the performance of convolutional neural networks. The CBAM attention module includes a channel attention module and a spatial attention module. By introducing channel attention and spatial attention into the convolutional neural network (CNN), the perceptual ability of the model is improved, thereby improving performance without increasing network complexity. The channel attention is used to enhance the feature representation of different channels, and the spatial attention is used to extract information from different locations in space. The channel attention module and the spatial attention module can be embedded in different layers of the CNN to enhance feature representation.

5. The construction waste detection method based on YOLOv11 and wavelet convolution as described in claim 1, characterized in that, The wavelet convolution module is used for: A two-dimensional first-order Haar wavelet transform is performed by combining operations in the width and height spatial dimensions of the input image. Four 2×2 filters are used as four convolutional kernels, which form a set of orthonormal bases. Each of the four convolutional kernels is a low-pass filter. High-pass filter High-pass filter and high-pass filter ; A depthwise convolution operation with a stride of 2 is performed on the input image. The input image is convolved through the four convolution kernels respectively, and feature maps of four channels are output: ;in, It is the input image. This indicates convolution, where the resolution of the feature maps for each of the four channels is half the resolution of the input image. It refers to the low-frequency components of the input image that are below a preset frequency threshold. It refers to the high-frequency components of the input image that exceed a preset frequency threshold. It refers to the vertical high-frequency components of the input image that exceed a preset frequency threshold. It is the diagonal high-frequency component of the input image that exceeds a preset frequency threshold; Inverse wavelet transform is performed by transposing convolution, and transposing convolution is performed using the four convolution kernels to reconstruct the feature map. Cascaded wavelet decomposition is achieved by recursively decomposing the low-frequency components.

6. The construction waste detection method based on YOLOv11 and wavelet convolution as described in claim 1, characterized in that, The wavelet convolution module incorporates an adjustment unit, which is used for: For each local feature region to be decomposed in the wavelet convolution module, analyze the numerical fluctuation within the region, calculate the degree of local numerical change, and determine whether the current region is affected by occasional data deviations in the storage unit, and whether there are jitter signals or fuzzy signals caused by noise. A sliding window is set for each input feature map. For each pixel in the sliding window, the numerical dispersion of the surrounding local area is calculated to obtain the local numerical fluctuation level. The local numerical fluctuation level is compared with a preset fluctuation threshold, and adaptive parameter adjustments are made to generate a high-frequency sub-feature map that has been purified and exceeds the preset frequency threshold, so as to characterize the edges and textures of construction waste in aerial images and avoid artifacts or blurring caused by data deviation.

7. The construction waste detection method based on YOLOv11 and wavelet convolution as described in claim 6, characterized in that, The adaptive parameter adjustment includes: for local areas where the local numerical fluctuation level exceeds the preset fluctuation threshold, setting a detail retention threshold for the generation of high-frequency subbands to suppress irregular jitter or fuzzy signals caused by data deviation; for local areas where the local numerical fluctuation level does not exceed the preset fluctuation threshold, retaining the original decomposition strategy so that all real detail information is fully extracted.

8. A construction waste detection system based on YOLOv11 and wavelet convolution, characterized in that, include: The model building module is used to embed a wavelet convolutional module into the Backbone layer of the acquired YOLOv11 network, and to embed three wavelet convolutional modules and two CBAM attention modules into the Neck layer of the YOLOv11 network to build an object detection model to enhance the ability to process high-frequency detail information and cross-scale feature interaction; wherein, each of the two CBAM attention modules is adjacent to a wavelet convolutional module. The first feature processing module is used to input the collected construction waste image into the Backbone layer of the target detection model, and use the wavelet convolution module to capture high-frequency detail information of the image that exceeds the preset frequency threshold by utilizing the multi-resolution analysis characteristics of wavelet transform, extract multi-scale features and generate the first feature map. The second feature processing module is used to input the first feature map into the Neck layer of the target detection model, fuse the first feature maps of different depths, and capture and enhance feature information of different frequencies through wavelet convolution module and CBAM module embedded in the Neck layer before each scale output of the path from high-level features to low-level features and from low-level features to high-level features, dynamically learn channel and spatial attention weights, and generate the second feature map. The target detection module is used to input the second feature map into the head layer of the target detection model, predict the target category and bounding box based on the second feature map, and perform target detection of construction waste on feature maps at three different scales based on three parallel detection heads.

9. An electronic device, characterized in that, include: A processor, a memory, and a program stored in the memory and executable on the processor, wherein the program, when executed by the processor, implements the construction waste detection method based on YOLOv11 and wavelet convolution as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a program or instructions that, when executed by a processor, implement the construction waste detection method based on YOLOv11 and wavelet convolution as described in any one of claims 1 to 7.