Multi-target intelligent detection method for apparent disease of concrete structure based on improved YOLOv8

By introducing a multi-path collaborative feature enhancement module into the YOLOv8 model, the problems of inaccurate feature extraction and missed detection of small targets in concrete structure appearance detection under complex backgrounds are solved, and high-precision and robust multi-target defect identification is achieved.

CN122200084APending Publication Date: 2026-06-12NANTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANTONG UNIV
Filing Date
2026-03-06
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing YOLO models suffer from problems such as inaccurate feature extraction, high false negative rate for small targets, and insufficient multi-target discrimination ability in the detection of apparent defects in concrete structures under complex backgrounds, making it difficult to meet the requirements of high accuracy and robustness.

Method used

In the YOLOv8 model, a multi-path collaborative feature enhancement module is introduced into the feature extraction backbone network and feature fusion network. This module combines multi-residual coupled multi-scale feature paths with a large kernel separable attention mechanism. Through multi-scale convolution kernels and deconvolution operations, the feature capture capability is improved. Furthermore, a hybrid loss function and dynamic label allocation strategy are adopted to optimize model performance.

🎯Benefits of technology

It significantly improves the accuracy and robustness of the model in multi-target disease detection under complex backgrounds, especially the recognition effect of small-sized targets, and improves recall, precision and overall index. It also adapts to changes in illumination and viewing angle differences and enhances robustness against environmental noise.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122200084A_ABST
    Figure CN122200084A_ABST
Patent Text Reader

Abstract

The application discloses a kind of concrete structure apparent disease multi-objective intelligent detection method based on improved YOLOv8, core improvement is in the C2f module of feature extraction backbone network and feature fusion network of YOLOv8 model, deploy multi-path collaborative feature enhancement module, replace original structure;The multi-path collaborative feature enhancement module is composed of multi-residual coupling multi-scale feature path and large-core separable attention mechanism, can strengthen detail capture, inhibit noise, improve the recognition ability of small size disease and low contrast target of model.Experiments show that, compared with original YOLOv8, the precision, recall rate, F1 value and average precision of the improved model are significantly improved in the detection of concrete cracks, spalling, exposed steel and other diseases, and by fusing batch normalization and activation layer, the inference efficiency is optimized, providing a more optimal technical path for high-precision and high-robustness identification of apparent diseases of existing building concrete in complex scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of structural health monitoring technology, specifically involving a high-precision multi-target defect identification scheme based on computer vision and deep learning, which is used for the optimization of detection of various apparent defects in concrete structures, and is especially suitable for intelligent detection of multi-target defects such as concrete cracks, spalling, and exposed rebar in existing buildings. Background Technology

[0002] During long-term use or when subjected to sudden loads such as earthquakes or impacts, buildings experience varying degrees of structural damage, ranging from minor cracks to concrete spalling, then to exposed rebar and ultimately, complete collapse. Cracks are the initial cause of spalling; concrete spalling leads to exposed rebar, which is susceptible to environmental erosion, inducing and accelerating corrosion. Rebar corrosion, in turn, causes the rebar to expand, leading to concrete cracking and even spalling. Cracks reduce the effective load-bearing cross-section of building components, and spalling further weakens the stiffness of these components, particularly impacting compression members. This vicious cycle and dynamic expansion of these defects significantly increase the difficulty of maintaining existing buildings. To promptly assess building safety, efficient and accurate multi-target defect detection is necessary.

[0003] Currently, the detection of various defects in existing buildings faces challenges such as significant differences in target features, severe occlusion, and complex lighting conditions. Traditional methods based on manual features are no longer sufficient to meet the requirements for real-time performance and accuracy. Detection technology is evolving from traditional manual inspection to multi-target and intelligent methods. Multi-target defect detection refers to the comprehensive identification of different defects by simultaneously combining information from multiple defect image layers and utilizing technologies with varying spatial resolutions and perception accuracies.

[0004] Advances in deep learning technology have driven the development of lightweight object detection networks. Deep learning-based computer vision technology, with its automation and efficiency, has provided new impetus and become a research hotspot for intelligent detection of multi-target defects in concrete surface damage (such as cracks, spalling, and exposed rebar). The YOLO series of algorithms is widely used due to its ability to directly locate targets and identify categories, as well as its high efficiency. However, it also has some limitations: the receptive field of its convolutional layers is limited, resulting in insufficient ability to identify defect features in images with complex backgrounds. Furthermore, concrete surface damage datasets are usually captured manually, resulting in problems such as varying viewing distances, significant differences in defect features, and blurred lighting. These factors cause YOLO to perform poorly when processing such data, with a significant decrease in precision and recall, making it difficult to meet practical detection needs.

[0005] Secondly, the existing YOLO structure is not adapted to the unique target distribution in architectural scenarios, lacking spatial attention mechanisms or category-differentiated detection strategies, which easily leads to target confusion and missed detections. Furthermore, when faced with data manifolds with complex geometric characteristics (such as mixed surfaces containing both spherical convexities and hyperbolic concavities), the traditional single-path network architecture cannot form the necessary dimensional entanglement, resulting in the loss of key geometric features (such as curvature sign reversal and multi-connected holes) during the mapping process. The unidirectional nature of gradient propagation further constrains parameter updates within the initial topological boundary, limiting expressive power and resulting in poor performance in identifying multi-target concrete surface damage in complex backgrounds.

[0006] Taking all the above factors into consideration, how to enable convolutional neural network models to meet the detection requirements of concrete structure surface defects images under complex backgrounds, and improve the recognition accuracy of multi-target defect images of buildings, that is, improve the detection accuracy and robustness of typical surface defects such as cracks, spalling, leakage, and steel corrosion, is a problem that needs to be considered and solved in this field. Summary of the Invention

[0007] Objective: To develop an intelligent multi-target detection method for apparent defects in concrete structures based on improved convolutional neural networks and deep learning. Addressing the issues of information loss and insufficient focusing ability in the extraction of multi-scale, irregular defects in concrete using existing convolutional neural networks, which easily leads to missed detection of small targets and false detection of complex targets, this invention introduces an innovative multi-path collaborative feature (multi-residual coupling) enhancement module into the feature extraction network.

[0008] Technical solution: The present invention provides a multi-objective intelligent detection method for apparent defects in concrete structures based on an improved YOLOv8, comprising the following steps:

[0009] Step 1: Dataset collection and processing: Collect images including concrete cracks, spalling, and exposed rebar. Expand the dataset by cropping, scaling proportionally, and data augmentation. Use Labelme for fine annotation and divide the dataset into training, validation, and test sets.

[0010] Step 2, Model Improvement: Based on the YOLOv8 model, a multi-path collaborative feature enhancement module is deployed in the C2f module of its feature extraction backbone network and feature fusion network to replace the original standard Bottleneck structure. The multi-path collaborative feature enhancement module consists of multi-residual coupled multi-scale feature paths and a large kernel separable attention mechanism. The multi-residual coupled multi-scale feature paths enrich the captured features by coupling multiple target features.

[0011] Step 3, Model Training and Parameter Optimization: Based on the improved YOLOv8 model, set training parameters and anti-overfitting strategies, train the model using the training set, and dynamically adjust the parameters by means of the validation set precision P, recall R, comprehensive index F1 and mean precision mAP to balance the model's precision and generalization ability.

[0012] Step 4: Multi-target defect detection: Input the preprocessed concrete image to be detected into the trained improved model, and output the identification results of concrete surface defects.

[0013] Furthermore, the YOLOv8 model adopts a modular four-order architecture, including an input layer, a backbone network, a neck feature fusion layer, and a head detection output layer. Technical features include: a reconstructed C2f module utilizing dual-branch heterogeneous processing to enhance gradient propagation and feature reuse; decoupled detection heads separating classification and regression tasks to reduce interference; a dynamic label allocation strategy to improve positive sample recall; and a hybrid loss function integrating Varifocal Loss and Distributed Focal Loss (DFL) to ensure stable optimization in complex scenarios. The classification loss is calculated using the Varifocal Loss (VFL), while the global category loss uses the Binary Cross-Entropy Loss (BCEL). For localization loss, a combination of Complete Intersection over Union (CIOUL) and Distributed Focal Loss (DFL) is used. CIOUL measures the difference between the predicted bounding box and the ground truth bounding box, while DIOUL measures the Euclidean distance between the center points of two detection boxes. Based on DIOUL, CIOUL further incorporates the aspect ratio of the bounding box to improve the accuracy of bounding box localization. The specific expression of the loss function is as follows:

[0014] (1)

[0015] (2)

[0016] (3)

[0017] In the above formula, ρ represents the difference between the actual bounding box center point e and the predicted bounding box center point. The Euclidean distance between them; b is the diagonal length of the smallest enclosing region; Here, υ is the weighting function; υ is the aspect ratio similarity measure of the bounding boxes. The distance intersection-union ratio; The width of the actual bounding box; The actual height of the bounding box; is the width of the prediction box, h is the width of the prediction box.

[0018] Furthermore, the C2f module is divided into three typical stages based on its semantic abstraction capabilities at different levels: low-level feature extraction - layer 2, mid-level feature integration - layers 4 and 6, and high-level semantic modeling - layer 8, forming a progressive feature pyramid system; the multi-path collaborative feature enhancement module is preferentially deployed in the low-level and mid-level C2f modules.

[0019] Furthermore, the multi-path collaborative feature enhancement module introduces multi-scale convolution kernels and deconvolution operations, combined with residual connection mechanisms and bottleneck channel compression strategies, to improve representation capabilities and detection performance while controlling the number of model parameters; the multi-residual coupled multi-scale feature pathways simulate the intersection points in the trefoil topology to construct multi-target feature pathways.

[0020] Furthermore, in step 1, the original dataset is cropped and proportionally scaled to 512×512 pixels, and then augmented using data augmentation methods such as rotation, horizontal flipping, and mirroring. After labeling, it is randomly divided into training set, validation set, and test set in a 7:2:1 ratio, and cross-validation by multiple people ensures that the labeling consistency is ≥95%.

[0021] Furthermore, in step 3, the training parameters are set to batch_size=8, img_size=512, and the training rounds are 300. By designing three sets of comparative experiments of single-layer deployment, two-layer deployment, and multi-layer deployment, the optimal deployment scheme of the multi-path collaborative feature enhancement module is determined.

[0022] Furthermore, the multi-path collaborative feature enhancement module integrates batch normalization and activation layer into a unified computing unit.

[0023] Furthermore, the experimental steps include: conducting low-cycle reciprocating loading tests on steel frame composite shear wall structures, collecting multi-target apparent damage data of concrete components on-site, comparing the detection performance of YOLOv8, YOLOv8-PSA, YOLOv8-SEAttention, and YOLOv8-DANet with the improved model, and verifying the feasibility and superiority of the improved model.

[0024] The present invention also discloses a computer device, including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the method of the present invention.

[0025] The present invention also discloses a computer-readable storage medium having a computer program / instructions stored thereon, which, when executed by a processor, implements the steps of the method of the present invention.

[0026] Beneficial Effects: Compared with existing technologies, this invention has the following significant advantages: Comparative experiments show that when the multi-path collaborative feature enhancement module is embedded into the low-level convolutional layers (such as the shallow feature extraction layer in the C2f module) and the middle-level convolutional layer 1 (such as the first-level feature fusion layer of the Bottleneck network) of YOLOv8, the model performance reaches its optimal level. Compared with the original YOLOv8 model, this improved model has the following beneficial effects:

[0027] (1) This model adopts a multi-path collaborative feature enhancement module structure, and combines multi-residual network to simulate the intersection point in the trefoil topology to construct multi-target feature paths, effectively addressing the problem of multi-target feature differences in complex scenarios. At the same time, batch normalization and activation layer are integrated into a unified computing unit to improve inference efficiency, reduce the average execution time by 9.1%, and reduce the GPU core API time consumption by 18.4%, achieving coordinated optimization of accuracy and speed.

[0028] (2) This model integrates multi-path collaborative feature enhancement module technology to highlight key features and suppress noise, significantly enhancing target representation capabilities and detection accuracy. This technology focuses on key areas of the image, improving the recognition effect of multi-target defects in concrete structures, especially in scenarios such as minor cracks, small-area concrete spalling, a small amount of exposed rebar, and low-contrast backgrounds. Through the global-local feature fusion module, macroscopic structures and microscopic details are captured simultaneously, ensuring stable detection values ​​under extreme conditions such as low light and large tilt angles, demonstrating high environmental robustness.

[0029] (3) This model can accurately identify different target defects such as cracks, spalling and exposed steel bars in existing building concrete structures. Compared with the original YOLOv8 model, the optimal detection scheme deployed by the multi-path collaborative feature enhancement module has achieved certain improvements in the detection effect of the above defects in P / R / F1 / mAP.

[0030] (4) The introduction of the multi-path collaborative feature enhancement module significantly improves the model's feature extraction capability for multiple targets, which is crucial for the accurate identification of multi-target diseases in complex backgrounds. The low-level network is the key to basic feature extraction, and its quality directly affects subsequent processing; the mid-level network can effectively improve performance by working in conjunction with the low-level network; although the high-level network can enhance discriminability, it should be used with caution to avoid missed detections. The multi-path collaborative feature enhancement module combination deployment for different level characteristics optimizes the overall performance of the model, achieving significant improvements in key indicators such as accuracy (P), recall (R), comprehensive index F1, and average precision (mAP). It is particularly noteworthy that the detection accuracy (P) of small-sized targets is significantly improved, verifying that the embedding of TC effectively enhances the detection of small targets. On the concrete crack dataset: recall R↑5.9%, precision P↑7.2%, F1 score↑5.3%, mAP↑3.9%; on the concrete spalling dataset: recall R↑1.9%, precision P↑5.3%, F1 score↑3.1%, mAP↑0.8%; on the exposed rebar dataset: recall R↑0.1%, precision P↑2.5%, F1 score↑2.2%, mAP↑1.6%. Attached Figure Description

[0031] Figure 1 YOLO Multi-Path Collaborative Feature Enhancement Module Architecture Diagram;

[0032] The diagram shows the YOLOv8 architecture (input layer, backbone network, neck feature fusion layer, head detection output layer), the multi-path collaborative feature enhancement module (trilobite convolutional architecture), and the replacement relationships of the YOLOv8-C2f modules. Here, Conv represents a convolutional layer, DConv represents a deconvolutional layer, LSKA represents a large kernel separable attention mechanism, Bottleneck represents a standard bottleneck structure, shortcut-True, n=3 indicates that residual connections are enabled and the number of modules is 3, Split represents feature segmentation, Concat represents feature concatenation, Input is a 640×640×3 input image, Backbone is the backbone network, Neck is the neck feature fusion layer, Head is the head detection output layer, and 80×80×256 represents the feature map size and number of channels.

[0033] Figure 2 A multi-objective image set of apparent defects in concrete (partial).

[0034] The images show typical pictures of three types of defects in concrete: cracks, spalling, and exposed rebar, taken by manual inspections and collected online. They visually present the appearance characteristics of different defects and their complex background environments.

[0035] Figure 3Experimental results of deploying the multi-path collaborative feature enhancement module at different convolutional layer locations and combinations in YOLOv8;

[0036] The figure contains three sub-figures: (a) concrete cracks, (b) concrete spalling, and (c) exposed concrete reinforcement. These sub-figures respectively show the comparative data of four core experiments: Experiment 1 (original model), Experiment 2 (single-layer optimal deployment), Experiment 6 (double-layer optimal deployment), and Experiment 12 (multi-layer optimal deployment) in terms of precision (P), recall (R), F1 score, and mAP. The figure clearly presents the performance differences of different deployment schemes, with Experiment 6 (double-layer optimal deployment) showing the best performance.

[0037] Figure 4 Performance comparison of five detection models on a multi-objective apparent disease dataset of concrete;

[0038] The figure includes three sub-figures: (a) concrete cracks, (b) concrete spalling, and (c) exposed concrete reinforcement. The detection performance of five models—YOLOv8, YOLOv8-PSA, YOLOv8-SEAttention, YOLOv8-DANet, and YOLOv8-Multi-path Collaborative Feature Enhancement Module—was compared, verifying the superiority of the improved model of this invention in the detection of various defects. Detailed Implementation

[0039] The technical solution of the present invention will be further described below with reference to the accompanying drawings.

[0040] This invention addresses the shortcomings of existing deep learning models in detecting surface defects in concrete structures with complex backgrounds and irregular shapes, including inaccurate feature extraction, high false negative rates for small targets, and insufficient multi-target discrimination capabilities. It discloses an innovative deep learning-based multi-target defect detection model framework. This framework specifically optimizes the general target detection architecture. Its core innovation lies in introducing a multi-path collaborative feature enhancement mechanism and modularly integrating it into key parts of the network, thereby significantly improving the accuracy and robustness in identifying various surface defects such as cracks, spalling, honeycombing, and steel corrosion.

[0041] The technical process used in this invention is described in the specific implementation plan.

[0042] This invention utilizes the YOLOv8 model, which employs a modular four-order architecture (input layer, backbone network, neck feature fusion layer, and head detection output layer), balancing accuracy and speed through Pareto optimization. Key technical features include: reconstructing the C2f module to enhance gradient propagation and feature reuse through dual-branch heterogeneous processing; decoupling the detection head to separate classification and regression tasks to reduce interference; a dynamic label allocation strategy to improve positive sample recall; and a hybrid loss function integrating Varifocal Loss and DFL Loss to ensure stable optimization in complex scenarios.

[0043] The multi-path collaborative feature enhancement module consists of multi-residual coupled multi-scale feature pathways and a large-kernel separable attention mechanism. The former enriches the captured features by coupling multiple target features; the latter effectively captures long-distance dependencies and accurately focuses on key information through large-kernel convolution. Together, they construct the overall architecture of the multi-path collaborative feature enhancement module, significantly improving the model's expressive power and performance.

[0044] This invention focuses on the improved design of neural network backbone architecture. Specifically, in the feature extraction backbone network and / or feature fusion network C2f module, one or more multi-path collaborative feature enhancement modules are deployed to replace the original standard Bottleneck structure. This improved module undertakes the core feature extraction function. By introducing multi-scale convolutional kernels and deconvolution operations, combined with residual connection mechanisms and bottleneck channel compression strategies, it significantly improves representation ability and detection performance while controlling the number of model parameters. In the original YOLOv8 architecture, the C2f module, as a key building block, is divided into three typical stages based on its semantic abstraction capabilities at different levels: low-level feature extraction (layer 2), mid-level feature integration (layers 4 and 6), and high-level semantic modeling (layer 8), forming a progressively advancing feature pyramid system. Based on this, with the multi-path collaborative feature enhancement module as the core, a model group with a total of 15 structural variants is constructed using a full permutation combination strategy, covering various configuration modes from single-level local replacement to cross-level collaborative optimization. This initiative aims to systematically explore the mechanism of action of multi-pathway collaborative feature enhancement modules at different network depths, revealing their multi-level influence patterns from point to surface and from independence to coupling, thereby providing theoretical support and practical guidance for the design of lightweight, high-performance detection models.

[0045] The core idea of ​​the technical solution stems from the pivotal position of the C2f module in the feature pyramid and its cross-stage connection characteristics. In the low-level structure, the second-layer C2f module undertakes the primary encoding task of the original image spatial features. Its lightweight transformation directly affects the network's ability to capture fine-grained texture features of apparent damage to concrete components. The middle layers, the fourth and sixth layers, achieve cross-scale semantic interaction through a two-level feature fusion mechanism. The operator replacement in this region will reshape the combination of multi-resolution features. The high-level eighth layer, as a key node for advanced semantic abstraction, will change the construction process of the overall target representation through structural variation. Compared with conventional convolutional units, selecting the C2f module with multi-branch interaction characteristics for transformation has dual advantages: the gradient splitting mechanism formed by cross-layer connections can amplify the impact of operator improvement on the network training dynamics; the multi-target feature recombination process within the module provides a multi-dimensional observation window for observing the effect of the operator. The scientific nature is reflected in two dimensions: (1) By controlling the hierarchical position and combination of module replacement, a comparative model with the same number of modules but different distribution patterns is constructed, thereby separating the impact of position effect and quantity effect on performance. (2) Select the C2f module with a clear hierarchical division of labor as the modification carrier to ensure that the variation in network structure is within the observable threshold range. If only a single layer of conventional convolution is replaced, the structural perturbation amplitude (about 0.3% parameter change) is unlikely to produce a significant performance difference, while replacing the C2f module can trigger a sufficient model response. This experimental scheme can effectively characterize the performance gain of the innovative module and avoid the risk of dimensional collapse caused by excessive modification of network depth.

[0046] In multi-object detection algorithms, distance metrics are used to measure the similarity or difference between different objects. In this process, object features are typically represented as vectors, and then distance metrics such as Euclidean distance, Mahalanobis distance, and cosine distance are used to compare the similarity between these feature vectors.

[0047] Euclidean distance is one of the most commonly used distance metrics, used to measure the straight-line distance between two points. Suppose there are two points in n-dimensional space... and points The Euclidean distance between them :

[0048]

[0049] Mahalanobis distance is a distance metric that considers the covariance structure between data points, and it can address issues related to high-dimensional data and correlations. Assuming we have two vectors X and Y with covariance matrix ∑, the Mahalanobis distance can be expressed as:

[0050]

[0051] Cosine similarity measures the cosine of the angle between two vectors and is typically used to compare the similarity of high-dimensional data such as text and images. Assume there are two n-dimensional points... and points The cosine value of A,B is calculated using the formula below:

[0052]

[0053] Furthermore, the formula for cosine distance can be obtained as follows:

[0054]

[0055] The core innovation of this invention lies in the deployment of one or more multi-path collaborative feature enhancement modules in the feature extraction backbone network and / or feature fusion network. This technology achieves significant improvements in the following three dimensions:

[0056] (1) Architectural innovation and feature extraction optimization:

[0057] The multi-path collaborative feature enhancement module is deployed in the low- and mid-level convolutional layers of YOLOv8 to achieve dual optimization: the low-level layer enhances detail capture, while the mid-level layer enhances the selective attention mechanism to suppress noise. By integrating a multi-residual network simulating a trefoil knot with multi-target feature pathways, the model's representation ability is significantly improved, especially for identifying small-sized targets (such as minor cracks, small-area spalling, and a small amount of exposed rebar) or low-contrast background targets.

[0058] (2) Performance advantages and improved accuracy:

[0059] The introduction of multi-path collaborative feature enhancement modules significantly improves the model's feature extraction capabilities for multiple targets, which is crucial for the accurate identification of multi-target diseases in complex backgrounds. Low-level networks are key to basic feature extraction, and their quality directly affects subsequent processing; mid-level networks, in conjunction with low-level networks, can effectively improve performance; while high-level networks can enhance discriminability, they should be used cautiously to avoid missed detections. The combined deployment of multi-path collaborative feature enhancement modules tailored to different layer characteristics optimizes the overall model performance, achieving significant improvements in key metrics such as precision (P), recall (R), F1 score, and mean AP. Particularly noteworthy is the significant improvement in detection precision (P) for small targets, validating that TC embedding effectively enhances the detection of small targets. On the concrete crack dataset: recall R↑5.9%, precision P↑7.2%, F1 score↑5.3%, mAP↑3.9%; on the concrete spalling dataset: recall R↑1.9%, precision P↑5.3%, F1 score↑3.1%, mAP↑0.8%; on the exposed rebar dataset: recall R↑0.1%, precision P↑2.5%, F1 score↑2.2%, mAP↑1.6%.

[0060] (3) Enhanced engineering value and robustness:

[0061] This scheme demonstrates excellent robustness and generalization ability in complex backgrounds (such as densely textured concrete surfaces), highlighting the advantages of the multi-path collaborative feature enhancement module in improving the model's ability to distinguish complex backgrounds. Secondly, it exhibits high sensitivity to the detection of multi-target surface defects in concrete structures. Compared to similar models, this scheme is more adaptable to changes in lighting, viewing angle, and target size, and is robust to environmental noise. This improved model scheme provides a superior technical approach for solving the challenge of high-precision and high-robust identification of surface defects in existing building concrete under complex scenarios.

[0062] Example:

[0063] The specific embodiments of the present invention are shown in the accompanying drawings. Figure 1 As shown, this document presents a dataset of images of concrete cracks, spalling, and exposed rebar in existing buildings, collected via network and manually inspected. A flowchart illustrates the data collection and processing methods. The original dataset collected for this project comprises 1590 images, with defects including concrete cracks, spalling, and exposed rebar (e.g.,...). Figure 2 ).

[0064] Dataset processing: The original dataset was cropped and proportionally scaled to 512×512 pixels (with gray border filling), and then augmented to 3650 images through data enhancement methods such as rotation, horizontal flipping, and mirroring. All cracks, spalling, and exposed rebar areas were finely annotated using Labelme (distinguishing between different defects), and cross-validation by multiple researchers ensured annotation consistency ≥95%. After labeling, the dataset was randomly divided into a training set (2555 images), a validation set (730 images), and a test set (365 images) in a 7:2:1 ratio.

[0065] Configure relevant parameters: Based on the YOLO-multi-path collaborative feature enhancement module architecture, optimize training parameters (batch_size=8, img_size=512, etc.) and anti-overfitting strategies to achieve high-precision detection of multiple target features of concrete; dynamically adjust parameters through validation set metrics (P / R / F1 / mAP) to balance model accuracy and generalization ability.

[0066] Design comparative experiments: With the same parameter settings, the multi-path collaborative feature enhancement module was deployed in the C2f modules of layers 2, 4, 6, and 8 of the YOLOv8 basic network, respectively, in the low-level, middle-level 1, middle-level 2, and high-level convolutional layers of YOLOv8, and then trained for 300 rounds. According to the differences in deployment position and quantity, three groups of experiments were designed: (1) single-layer deployment experiment (experiment number 1-5); (2) double-layer deployment experiment (experiment number 6-11); (3) multi-layer deployment experiment (experiment number 12-16). The optimal deployment scheme was determined by comprehensively analyzing the experimental data of 16 groups of experiments. In order to make the experimental data of each disease target detection clearer, the experimental data analysis diagram ( Figure 3 The presentation showcases four core experimental sets: Experiment 1 (original model), Experiment 2 (optimal single-layer deployment), Experiment 6 (optimal two-layer deployment), and Experiment 12 (optimal multi-layer deployment). A comprehensive comparative analysis of the four sets of experimental data reveals that Experiment 6 (optimal two-layer deployment) represents the best trilobal convolution deployment scheme among the 16 experiments, demonstrating a significant performance improvement over the original model.

[0067] Verification experiments were conducted: To verify the recognition effect of the multi-target apparent defect detection model for concrete structures based on the multi-path collaborative feature enhancement mechanism and YOLO algorithm of this invention, a "low-cycle reciprocating loading test of a steel frame composite shear wall structure" was carried out. Multi-target apparent damage data of concrete components were collected on-site, and the dataset was processed according to step 1. Based on previous research, the five detection models selected for the verification experiment—YOLOv8, YOLOv8-PSA, YOLOv8-SEAttention, YOLOv8-DANet, and YOLOv8-multi-path collaborative feature enhancement module—were compared and analyzed. Figure 4 This verifies the feasibility and superiority of the detection model.

[0068] The above embodiments are merely preferred embodiments of the present invention. It should be noted that those skilled in the art can make several improvements and equivalent substitutions without departing from the principle of the present invention. All such improvements and equivalent substitutions to the claims of the present invention fall within the protection scope of the present invention.

Claims

1. A multi-objective intelligent detection method for apparent defects in concrete structures based on an improved YOLOv8, characterized in that, Package the following steps: Step 1: Dataset collection and processing: Collect images including concrete cracks, spalling, and exposed rebar. Expand the dataset by cropping, scaling proportionally, and data augmentation. Labels are used for annotation, and the dataset is divided into training, validation, and test sets. Step 2, Model Improvement: Based on the YOLOv8 model, a multi-path collaborative feature enhancement module is deployed in the C2f module of its feature extraction backbone network and feature fusion network to replace the original standard Bottleneck structure. The multi-path collaborative feature enhancement module consists of multi-residual coupled multi-scale feature paths and a large kernel separable attention mechanism. The multi-residual coupled multi-scale feature paths enrich the captured features by coupling multiple target features. Step 3, Model Training and Parameter Optimization: Based on the improved YOLOv8 model, set training parameters and anti-overfitting strategies, train the model using the training set, and dynamically adjust the parameters by means of the validation set precision P, recall R, comprehensive index F1 and mean precision mAP to balance the model's precision and generalization ability. Step 4: Multi-target defect detection: Input the preprocessed concrete image to be detected into the trained improved model, and output the identification results of concrete surface defects.

2. The method according to claim 1, characterized in that, The YOLOv8 model adopts a modular four-order architecture, including an input layer, a backbone network, a neck feature fusion layer, and a head detection output layer. Its technical features include: a reconstructed C2f module utilizing dual-branch heterogeneous processing to enhance gradient propagation and feature reuse; decoupled detection heads to separate classification and regression tasks to reduce interference; a dynamic label allocation strategy to improve positive sample recall; and a hybrid loss function integrating Varifocal Loss and Distributed Focal Loss (DFL) to ensure stable optimization in complex scenarios. The classification loss is calculated using the Varifocal Loss (VFL), while the global class loss uses the Binary Cross-Entropy (BCEL) loss function. For localization loss, a combination of Complete Intersection over Union (CIOUL) and Distributed Focal Loss (DFL) is used. CIOUL measures the difference between the predicted bounding box and the ground truth bounding box, while DIOUL measures the Euclidean distance between the center points of two detection boxes. Based on DIOUL, CIOUL further incorporates the aspect ratio of the bounding box to improve localization accuracy. The specific expression of the loss function is as follows: (1) (2) (3) In the above formula, ρ represents the difference between the actual bounding box center point e and the predicted bounding box center point. The Euclidean distance between them; b is the diagonal length of the smallest enclosing region; For weighting functions; A measure of the aspect ratio similarity of bounding boxes; The distance intersection-union ratio; The width of the actual bounding box; The actual height of the bounding box; is the width of the prediction box, h is the width of the prediction box.

3. The method according to claim 1, characterized in that, Based on its semantic abstraction capabilities at different levels, the C2f module is divided into three typical stages: low-level feature extraction - layer 2, mid-level feature integration - layers 4 and 6, and high-level semantic modeling - layer 8, forming a progressive feature pyramid system; the multi-path collaborative feature enhancement module is preferentially deployed in the low-level and mid-level C2f modules.

4. The method according to claim 1, characterized in that, The multi-path collaborative feature enhancement module improves representation ability and detection performance while controlling the number of model parameters by introducing multi-scale convolution kernels and deconvolution operations, combined with residual connection mechanism and bottleneck channel compression strategy; the multi-residual coupled multi-scale feature path simulates the intersection point in the trefoil topology to construct multi-target feature path.

5. The method according to claim 1, characterized in that, In step 1, the original dataset is cropped and scaled proportionally to 512×512 pixels, and then augmented using data augmentation methods such as rotation, horizontal flipping, and mirroring. After labeling, it is randomly divided into training set, validation set, and test set in a 7:2:1 ratio. Multi-person cross-validation ensures that the labeling consistency is ≥95%.

6. The method according to claim 1, characterized in that, In step 3, the training parameters are set to batch_size=8, img_size=512, and the training rounds are 300. By designing three sets of comparative experiments of single-layer deployment, two-layer deployment and multi-layer deployment, the optimal deployment scheme of the multi-path collaborative feature enhancement module is determined.

7. The method according to claim 1, characterized in that, The multi-path collaborative feature enhancement module integrates batch normalization and activation layer into a unified computing unit.

8. The method according to claim 1, characterized in that, It also includes verification experimental steps: conducting low-cycle reciprocating loading tests on steel frame composite shear wall structures, collecting multi-target apparent damage data of concrete components on site, comparing the detection performance of YOLOv8, YOLOv8-PSA, YOLOv8-SEAttention, and YOLOv8-DANet with the improved model, and verifying the feasibility and superiority of the improved model.

9. A computer device comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the steps of the method of claim 1.

10. A computer-readable storage medium having a computer program / instructions stored thereon, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method of claim 1.