Dense debris detection and velocity calculation method based on YOLO and morphological contour extraction

By constructing real and virtual datasets and improving the YOLOv5 model, combined with multi-scale feature extraction and cluster analysis, the problems of missed detection and false detection in dense explosion fragment detection were solved, improving the detection accuracy and speed calculation accuracy of the model, and meeting the needs of high-precision damage performance assessment.

CN122243892APending Publication Date: 2026-06-19NORTHWEST INST OF NUCLEAR TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NORTHWEST INST OF NUCLEAR TECH
Filing Date
2026-03-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from missed detections and false detections in the detection of densely packed explosive fragments, and the scarcity of data leads to poor model generalization ability, making it difficult to meet the requirements for high-precision and high-reliability damage performance assessment.

Method used

We employ a YOLO-based and morphological contour extraction approach. By constructing real and virtual datasets, we improve the YOLOv5 model to adapt to the scale distribution of explosion debris. We combine multi-scale feature extraction and cluster analysis with deep learning and traditional algorithms to perform debris detection and velocity calculation.

Benefits of technology

It improves the detection accuracy and velocity calculation accuracy of dense explosion debris, enhances the generalization ability of the model, and achieves high-precision debris identification, stable tracking and accurate velocity measurement.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243892A_ABST
    Figure CN122243892A_ABST
Patent Text Reader

Abstract

This invention relates to the field of computer vision and image processing technology, specifically to a dense debris detection method, velocity calculation method, and detection system based on YOLO and morphological contour extraction. The method includes: constructing a fused dataset; constructing and training an improved YOLOv5 model; detecting debris regions using the trained target YOLOv5 model; and obtaining the final detection results. This invention effectively solves the problem of scarce real data through virtual data generation technology, improving model training effectiveness and generalization ability. Through the fusion design of deep learning and traditional algorithms, it enhances the high-precision detection and localization performance of dense small debris while ensuring the continuity and stability of multi-target tracking, thereby improving the accuracy of velocity calculation for dense explosion debris. This invention enables high-precision identification, stable tracking, and accurate velocity measurement of explosion debris.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer vision and image processing technology, specifically to a dense fragment detection method, speed calculation method, and detection system based on YOLO and morphological contour extraction. Background Technology

[0002] The statistical analysis of the number, spatial distribution patterns, and velocity calculation of densely packed explosion debris is of significant engineering importance in the field of damage performance assessment. With the gradual application of computer vision technology across various fields, image-based debris detection and velocity calculation methods are increasingly replacing traditional manual observation and sensor sampling, becoming an important technological direction. However, current applications still face many key bottlenecks, making it difficult to meet the demands for high-precision and high-reliability assessments. From the perspective of existing detection technologies, mainstream deep learning object detection algorithms have demonstrated excellent performance in general object recognition scenarios, but they still suffer from significant limitations in adaptability to specific, densely packed small targets such as densely packed explosion debris. On the one hand, the anchor frame size and proportion design of existing algorithms are mostly based on targets of conventional size (such as vehicles and pedestrians), while explosion debris is generally small in size, irregular in shape, and varies greatly in size. Fixed anchor frames are difficult to effectively cover the size characteristics of the debris, leading to a large number of missed and false detections during the detection process. On the other hand, in order to cope with the problem of target scale variation, some algorithms have introduced multi-scale feature fusion technology, attempting to improve the recognition ability of small targets by integrating feature information from different levels. However, in explosion scenarios, the contrast between debris and background (such as smoke, fire, and complex terrain) is low, and there is severe occlusion between debris. Existing multi-scale fusion methods are difficult to effectively extract the key features of debris, further reducing the detection accuracy. In addition to the detection accuracy problem, the scarcity of data from real explosion scenarios has also become a core obstacle restricting the development of the technology. Explosion experiments are costly and dangerous, and the environmental conditions of each explosion (such as weather, terrain, and explosion angle) are difficult to completely replicate, resulting in a limited number of real debris image samples and limited scene coverage. The performance of deep learning models is highly dependent on large-scale, diverse training data. Limited real samples cannot support the model in fully learning the feature patterns of fragments, resulting in poor generalization ability when facing different explosion scenarios and difficulty in consistently outputting accurate detection results. Therefore, how to supplement high-quality training data has become one of the key directions to overcome the current technical bottlenecks. Constructing virtual fragment data that conforms to the characteristics of real explosions to make up for the lack of real data has become an important research approach in this field. In the exploration of technologies related to small target detection and data supplementation, some technical approaches have provided references for explosion debris detection. For example, in the field of dense small target detection, some technologies have designed multi-column neural network architectures and used filters of different sizes to generate diverse receptive fields to adapt to small target features of different scales. This multi-scale perception design concept provides a reference for optimizing the adaptability of debris size. At the same time, some technologies have proposed multi-level feature fusion strategies, which enhance the complementarity of features at different levels through bidirectional feature transfer from bottom to top and from top to bottom, thereby improving the completeness of small target feature extraction. This has important reference value for solving the problems of explosion debris being confused with the background and weak features. Regarding data generation, existing technologies have verified the effectiveness of virtual data in data-scarce scenarios. Virtual targets with random appearances and actions are constructed using simulation platforms and synthesized into different real-world scenes to generate training data. This has demonstrated superior performance in target detection tasks compared to models trained solely on real data, providing a feasible path for generating virtual data for explosion debris. Furthermore, in feature optimization for small target detection, techniques have been developed that employ adaptive scale selection strategies to finely optimize hard pixels in different regions, enhancing the targeting of multi-scale feature fusion. This region-aware optimization approach also offers insights for improving the accuracy of dense debris detection. However, current technologies are mostly designed for general small target detection scenarios (such as crowd counting and pedestrian detection), and do not fully consider the special characteristics of explosion debris—such as image blurring due to high debris speed, dynamic interference during the explosion (fire, smoke), and the need for continuous debris trajectories. Therefore, existing technologies cannot be directly applied to explosion debris detection and velocity calculation tasks. There is an urgent need to integrate virtual data generation, improved small target detection networks, and multi-target tracking technologies to build a complete solution adapted to explosion scenarios, thereby overcoming existing technological bottlenecks and providing accurate and reliable data support for damage assessment.

[0003] Therefore, a method for dense fragment detection and velocity calculation based on YOLO and morphological contour extraction is needed to solve the above problems. Summary of the Invention

[0004] To address the problems of missed detections, false detections, and data scarcity in the detection of dense small targets generated by explosions in existing technologies, this invention provides a method for dense fragment detection and velocity calculation based on YOLO and morphological contour extraction to solve the existing problems.

[0005] The first aspect of this invention provides a method for dense fragment detection and velocity calculation based on YOLO and morphological contour extraction, employing the following technical solution, including: A sequence of consecutive video images containing the movement of explosive fragments from a real explosion experiment was obtained. Keyframe video images were extracted from the sequence, and the fragment regions in the keyframe video images were labeled. A real dataset was constructed based on the keyframe video images and their corresponding fragment regions. The contours and textures corresponding to the fragments in the keyframe video images of the real explosion experiment were extracted and synthesized into the background to automatically generate bounding box annotations. A virtual dataset was constructed based on each keyframe video image and the labeled fragment bounding boxes. The real dataset and the virtual dataset were merged into a fused dataset. An improved YOLOv5 model was constructed and trained on a fused dataset to obtain a trained target YOLOv5 model. The improved YOLOv5 model includes: a multi-branch dilated convolutional layer and a clustering analysis layer. The multi-branch dilated convolutional layer is used to extract multi-scale features. The clustering analysis layer is used to cluster the true bounding boxes of debris in the fused dataset, adaptively calculate and generate anchor boxes that fit the scale distribution of actual explosion debris. Each frame of the continuous video image sequence to be processed is input into the target YOLOv5 model, and the corresponding fragment region of each frame is output. A foreground mask is generated using a background modeling method, and noise is removed by morphological operations. The foreground contour is obtained using a contour extraction method, and non-fragmented regions are filtered by setting area conditions to obtain the target fragmented region. Obtain the intersection-union ratio (IUGR) of the target fragment region and the fragment region detected by the target YOLOv5 model. If the IUGR is greater than a set threshold, the fragment region detected by the target YOLOv5 model is taken as the final detection result; if the IUGR is less than or equal to the set threshold, the target fragment region is taken as the final detection result.

[0006] A further technical solution of the present invention is that the step of extracting keyframe video images from a continuous frame video image sequence is as follows: Based on the amount of pixel grayscale change between adjacent video frames, pixels with a grayscale difference greater than 30 are marked as moving pixels, and the ratio of moving pixels to the total number of pixels in the full image is used as the area ratio. Video image frames with an area ratio greater than 5% are used as keyframe video images with intense motion.

[0007] A further technical solution of the present invention is to extract the contours and textures corresponding to the fragments in each frame of the video image of the real explosion experiment by pixel-by-pixel segmentation, synthesize them into the background to automatically generate bounding box annotations, and construct a virtual dataset.

[0008] A further technical solution of the present invention is to extract the contours and textures corresponding to the fragments in each frame of the video image of the real explosion experiment by pixel-by-pixel segmentation, synthesize them into the background to automatically generate bounding box annotations, and construct a virtual dataset.

[0009] A further technical solution of the present invention is that the spatial pyramid pooling layer in the improved YOLOv5 model adopts a dilated convolutional pyramid layer.

[0010] A further technical solution of the present invention is that the clustering analysis layer uses the K-means clustering analysis algorithm to cluster the true bounding boxes of the fragments in the fused dataset, adaptively calculates and generates anchor box parameters that fit the actual scale distribution of explosion fragments, and replaces the original default anchor boxes of the model with the anchor box parameters.

[0011] A further technical solution of the present invention is to perform data augmentation operations during the training process of the improved YOLOv5 model. The data augmentation operations include random flipping, rotation, scaling, and color parameter adjustment.

[0012] A second aspect of the present invention provides a speed calculation method for dense fragment detection based on YOLO and morphological contour extraction, comprising: The time interval is calculated based on the camera's frame rate to obtain the single-frame speed; The average velocity of a fragment is obtained by averaging the velocities of all single frames in the motion trajectory of the same fragment in a continuous frame video image obtained by the dense fragment detection method based on YOLO and morphological contour extraction provided in the first aspect of the present invention. The ratio between pixels and actual physical scale is calculated based on camera calibration parameters, and the pixel velocity is converted into the actual physical velocity of the fragment.

[0013] A further technical solution of the present invention is to not perform velocity statistics on targets whose motion trajectory length is less than two frames.

[0014] A further technical solution of the present invention is that the step of obtaining the motion trajectory of the same fragment in consecutive frame video images is as follows: The method for dense fragment detection based on YOLO and morphological contour extraction, provided in the first aspect of this invention, obtains the position, bounding box, and frame index of fragment regions in consecutive frame video images. Based on the positional similarity between each pair of fragment regions in two adjacent video frames, the motion trajectory of each fragment in a continuous video image sequence is obtained.

[0015] A third aspect of the present invention provides a dense fragment detection system based on YOLO and morphological contour extraction, characterized in that the detection system comprises: The dataset construction module is used to acquire a sequence of continuous frame video images containing the movement of explosive fragments from a real explosion experiment; extract keyframe video images from the continuous frame video image sequence, annotate the fragment regions in the keyframe video images, and construct a real dataset based on the keyframe video images and their corresponding fragment regions; extract the contours and textures corresponding to the fragments in the keyframe video images of the real explosion experiment, and synthesize them into the background to automatically generate bounding box annotations; construct a virtual dataset based on each keyframe video image and the annotated fragment bounding boxes; and fuse the real dataset and the virtual dataset into a fused dataset. An improved YOLOv5 model construction and training module is used to build an improved YOLOv5 model and train it on a fused dataset to obtain a trained target YOLOv5 model. The improved YOLOv5 model includes: a multi-branch dilated convolutional layer and a clustering analysis layer. The multi-branch dilated convolutional layer is used to extract multi-scale features; the clustering analysis layer is used to cluster the true bounding boxes of debris in the fused dataset, adaptively calculate and generate anchor boxes that fit the scale distribution of actual explosion debris. The detection result output module is used to input each frame of the continuous video image sequence to be processed into the target YOLOv5 model and output the corresponding fragment region in each frame of the video image; generate a foreground mask using a background modeling method and remove noise through morphological operations; obtain the foreground contour using a contour extraction method, filter non-fragmented regions by setting area conditions, and obtain the target fragment region; obtain the intersection-union ratio (IU / U) of the target fragment region and the fragment region detected by the target YOLOv5 model; if the IU / U is greater than a set threshold, the fragment region detected by the target YOLOv5 model is taken as the final detection result; if the IU / U is less than or equal to the set threshold, the target fragment region is taken as the final detection result.

[0016] The beneficial effects of this invention are: This invention effectively solves the problem of scarcity of real data through virtual data generation technology, improving the model training effect and generalization ability. Through the fusion design of deep learning and traditional algorithms, it not only enhances the high-precision detection and positioning performance of dense small fragments, but also ensures the continuity and stability of multi-target tracking, thereby improving the accuracy of velocity calculation of dense explosion fragments. Through this invention, high-precision identification, stable tracking and accurate velocity measurement of explosion fragments can be achieved. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a flowchart illustrating a dense fragment detection method based on YOLO and morphological contour extraction according to the present invention. Figure 2 This is a flowchart of a dense fragment detection and velocity calculation method based on YOLO and morphological contour extraction in an embodiment of the present invention. Figure 3 This is a network structure diagram of the improved YOLOv5 model in an embodiment of the present invention; Figure 4 This is a flowchart of a method for calculating the speed of dense fragment detection based on YOLO and morphological contour extraction in an embodiment of the present invention. Figure 5 This is a comparison chart of experimental results between the method of the present invention and existing methods. Detailed Implementation

[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0020] Example 1 This invention provides an embodiment of a dense debris detection method based on YOLO and morphological contour extraction. This embodiment addresses the core requirements of explosion debris detection and velocity calculation, using a fusion of deep learning and traditional algorithms as its core. Through the collaborative work of debris detection and velocity calculation blocks, it solves the pain points of existing technologies, such as... Figure 1 and Figure 2 As shown, it includes: S1. Construct a fused dataset; Specifically, a sequence of consecutive video images containing the movement of explosive fragments from a real explosion experiment is obtained; keyframe video images are extracted from the sequence, fragment regions in the keyframe video images are labeled, and a real dataset is constructed based on the keyframe video images and their corresponding fragment regions; the contours and textures corresponding to the fragments in the keyframe video images of the real explosion experiment are extracted and synthesized into the background to automatically generate bounding box annotations, and a virtual dataset is constructed based on each keyframe video image and the labeled fragment bounding boxes; the real dataset and the virtual dataset are merged into a fused dataset.

[0021] For example, in one specific embodiment, the steps of acquiring a continuous frame video image sequence containing the movement of explosive fragments in a real explosion test, and extracting the fragment region from each frame video image in the real explosion test, and constructing a real dataset based on each frame video image and its corresponding fragment region, are as follows: A real explosion test is filmed using a high-speed camera to acquire a continuous frame video image sequence containing the movement of explosive fragments in the real explosion test, while simultaneously recording the camera frame rate and camera calibration parameters; key frame video images with intense motion are extracted from the frame video image sequence, and fragment bounding boxes are manually labeled to construct the real dataset. Specifically, the step of extracting key frame video images from the continuous frame video image sequence is as follows: Based on the pixel grayscale change between adjacent frame video images, pixels with a grayscale difference greater than 30 are marked as moving pixels, and the ratio of moving pixels to the total pixels of the entire image is used as the area ratio; video image frames with an area ratio greater than 5% are considered key frame video images with intense motion.

[0022] For example, in one specific embodiment, the steps of extracting the contours and textures corresponding to the fragments in each frame of the real explosion experiment video image and synthesizing them into the background to automatically generate bounding box annotations, and constructing a virtual dataset based on each frame of video image and the annotated fragment bounding boxes are as follows: extracting the contours and textures corresponding to the fragments in each frame of the real explosion experiment video image by pixel-by-pixel segmentation, synthesizing them into the background to automatically generate bounding box annotations, and constructing a virtual dataset.

[0023] For example, in one specific embodiment, the step of fusing real datasets and virtual datasets into a fused dataset is as follows: virtual and real samples are fused at a ratio of 3:1 to form a large-scale fused dataset.

[0024] It should be noted that the main purpose of step S1 is to supplement the gaps in real data through virtual data generation technology. That is, to extract fragment targets pixel by pixel from real explosion images, fully preserving their contour and texture features, and then to synthesize these fragments into different unlabeled background images, automatically generating bounding box annotations; at the same time, to simulate various fragment distributions and fragment diffusion angles, generating virtual data samples with diversity and representativeness; finally, to fuse the virtual samples with limited real data to construct a large-scale training set covering multiple scenarios, providing sufficient data support for subsequent model training.

[0025] S2. Construct an improved YOLOv5 model and train it; Specifically, an improved YOLOv5 model is constructed, and the improved YOLOv5 model is trained on the fused dataset to obtain the trained target YOLOv5 model. The improved YOLOv5 model includes: a multi-branch dilated convolutional layer and a clustering analysis layer. The multi-branch dilated convolutional layer is used to extract multi-scale features; the clustering analysis layer is used to cluster the true bounding boxes of debris in the fused dataset, adaptively calculate and generate anchor boxes that fit the actual scale distribution of explosion debris.

[0026] For example, such as Figure 3 As shown in one specific embodiment, to address the problem that explosion debris targets are extremely small and traditional pooling downsampling easily leads to the loss of key spatial features, this embodiment uses a dilated convolutional pyramid module to replace the spatial pyramid pooling layer in the original YOLOv5 model. Multi-scale features are extracted through multi-branch dilated convolutional layers, expanding the network's receptive field while maintaining high resolution of the feature map, and fully preserving the edge details and textures of the tiny debris. At the same time, a clustering analysis layer is used to cluster the true bounding boxes of the debris in the fused dataset, adaptively calculating and generating anchor box parameters that fit the actual scale distribution of explosion debris, replacing the model's original default anchor boxes to match the characteristics of the tiny debris, thus completing a deep improvement of the YOLOv5 model.

[0027] The clustering analysis layer uses the K-means clustering algorithm to cluster the true bounding boxes of debris in the fused dataset, adaptively calculates and generates anchor box parameters that fit the actual scale distribution of explosion debris, and replaces the original default anchor boxes of the model with the anchor box parameters.

[0028] For example, during the training of the improved YOLOv5 model, the pre-trained weights of the improved YOLOv5 model on the open-source, publicly available general object recognition dataset MS COCO are loaded, some backbone parameters of the network are frozen, and the optimizer is used to fine-tune the model on the fused dataset. During the training process, data augmentation operations such as random flipping, rotation, scaling, and color parameter adjustment are added to improve the robustness of the model in different scenarios and complete the model training.

[0029] It should be noted that the main function of step S2 is to: optimize the traditional YOLOv5 model to adapt to the fragment detection requirements, that is, redesign the anchor box size of the YOLOv5 model according to the actual scale distribution of fragments in the fused dataset, so that it better fits the size characteristics of fragments and improves the adaptability of small target detection; adopt a transfer learning strategy to transfer the weights pre-trained on a large general dataset to the fragment detection task, reducing the training difficulty of the model and accelerating convergence; at the same time, combine data enhancement methods such as flipping, rotation, scaling, and color perturbation to expand the sample distribution range, improve the robustness of the model under different explosion scenarios and lighting conditions, and ensure the model's ability to detect dense small fragments; at the same time, use an improved YOLOv5 model to train on the fused dataset, optimize the detection head structure, enhance the recognition ability of dense small fragments, and integrate traditional image processing methods to preprocess the fragment images to reduce background noise interference and further improve the fragment localization accuracy.

[0030] S3. Detect fragment regions using a trained target YOLOv5 model; Specifically, each frame of the continuous video image sequence to be processed is input into the target YOLOv5 model, and the corresponding fragment region in each frame of the video image is output.

[0031] S4. Obtain the final test results; Specifically, a foreground mask is generated using background modeling methods, and noise is removed through morphological operations. The foreground contour is obtained using contour extraction methods, and non-fragmented regions are filtered by setting area conditions to obtain the target fragmented region (i.e., the detection result of the traditional algorithm). The intersection-union ratio (IUU) of the target fragmented region and the fragmented region detected by the target YOLOv5 model is obtained. When the IUU is greater than a set threshold, the fragmented region detected by the target YOLOv5 model is taken as the final detection result; when the IUU is less than or equal to the set threshold, the target fragmented region is taken as the final detection result.

[0032] It should be noted that step S4 uses the dynamic physical features extracted by traditional algorithms to strictly constrain the static feature detection results of the deep learning of the YOLOv5 model, and outputs the final fusion detection results.

[0033] Example 2 This invention proposes a speed calculation method for dense fragment detection based on YOLO and morphological contour extraction, such as... Figure 4As shown, the method includes: based on the fragment regions in each frame of video images obtained in Example 1, obtaining the position, bounding box, and frame index of the fragment regions in each frame of video images; obtaining the motion trajectory of each fragment in a continuous frame video image sequence based on the positional similarity between each pair of fragment regions in two adjacent frame video images; calculating the time interval based on the camera's frame rate to obtain the single-frame velocity; averaging all single-frame velocities in the motion trajectory of the same fragment in continuous frame video images to obtain the average velocity of the fragment; and calculating the ratio between pixels and the actual physical scale based on camera calibration parameters to convert the pixel velocity into the actual physical velocity of the fragment.

[0034] It also includes: result output and visualization, namely drawing fragment bounding boxes, velocity values ​​and velocity directions on frame video images; generating the number of fragment velocity direction distributions; storing fragment identifiers, average velocities, trajectory coordinates and confidence information as general format files, and saving video results in common video formats.

[0035] It should be noted that the purpose of Example 2 is to employ a matching strategy combining "positional similarity + appearance feature similarity" between adjacent video frames. Specifically, spatial correlation is determined by calculating the positional distance between the detected target in the current frame and the tracked target in the previous frame, maintaining the continuity and stability of the trajectory and preventing target loss or confusion during tracking. Simultaneously, feature similarity is calculated by extracting fragment appearance features, combining both similarities to achieve accurate target association. For successfully matched targets, the position and frame index information in their trajectory container are updated. New targets that are not matched are assigned a new ID and a new trajectory is established. Targets that have not been matched for a long time are determined to have left the detection area or been completely occluded, terminating their trajectory tracking to ensure the continuous stability of the trajectory in complex scenes. After trajectory maintenance is completed, the displacement of fragments between adjacent frames is calculated based on the continuous frame position information recorded in the trajectory container. The time interval between adjacent frames is determined by combining the video frame rate, and the single-frame pixel velocity of the fragment is obtained by dividing the displacement by the time interval. The average pixel velocity of the fragment is obtained by averaging all single-frame velocities in the same trajectory, reducing the impact of instantaneous errors. The calculation results are associated with the fragment ID, and velocity direction distribution data and fragment quantity change curves are generated simultaneously.

[0036] Example 3 This embodiment 3 provides a dense debris detection system based on YOLO and morphological contour extraction. The detection system includes: a dataset construction module, an improved YOLOv5 model construction and training module, and a detection result output module. The dataset construction module is used to acquire a sequence of continuous frame video images containing the movement of explosive debris from a real explosion experiment; extract keyframe video images from the continuous frame video image sequence, annotate the debris regions in the keyframe video images, and construct a real dataset based on the keyframe video images and their corresponding debris regions; extract the contours and textures corresponding to the debris in the keyframe video images of the real explosion experiment, and synthesize them into the background to automatically generate bounding box annotations; construct a virtual dataset based on each keyframe video image and the annotated debris bounding boxes; and fuse the real dataset and the virtual dataset to obtain a fused dataset. The improved YOLOv5 model construction and training module is used to construct an improved YOLOv5 model and train the improved YOLOv5 model based on the fused dataset to obtain a trained model. The target YOLOv5 model, including the improved YOLOv5 model, comprises: a multi-branch dilated convolutional layer and a clustering analysis layer. The multi-branch dilated convolutional layer is used to extract multi-scale features; the clustering analysis layer is used to cluster the ground truth bounding boxes of debris in the fused dataset, adaptively calculating and generating anchor boxes that fit the actual scale distribution of explosion debris; the detection result output module is used to input each frame of the continuous video image sequence to be processed into the target YOLOv5 model and output the corresponding debris region in each frame of the video image; a foreground mask is generated using a background modeling method, and noise is removed through morphological operations; a foreground contour is obtained using a contour extraction method, and non-fragmented regions are filtered by setting area conditions to obtain the target debris region; the intersection-union ratio (IU / U) of the target debris region and the debris region detected by the target YOLOv5 model is obtained. When the IU / U is greater than a set threshold, the debris region detected by the target YOLOv5 model is taken as the final detection result; when the IU / U is less than or equal to the set threshold, the target debris region is taken as the final detection result.

[0037] The present invention will be described below with reference to specific data and accompanying drawings: I. First, keyframes are extracted from the explosion video sequence using moving target detection technology to capture clear images of the moment debris is generated. To address the issue of insufficient data, a virtual data generation method is employed. Debris targets obtained from real images through pixel-by-pixel extraction are composited into an unlabeled background image, and bounding box annotations are automatically generated using computer vision algorithms. By simulating different explosion scenarios and lighting conditions, the diversity of the data is expanded, making the dataset more comprehensive and representative.

[0038] II. Before training the model, the size distribution of fragments in the dataset was statistically analyzed, and the average width and height values ​​were calculated to adjust the anchor box size of YOLOv5 to better fit the target features of the fragments. Simultaneously, a transfer learning strategy was employed, applying pre-trained weights from a large general dataset to this task and fine-tuning them on a fused dataset to accelerate model convergence. During training, data augmentation operations such as random flipping, rotation, scaling, and color jitter were added to improve the model's robustness under different environments.

[0039] III. Based on the improved YOLOv5 detection, the moving foreground is extracted using the background subtraction method. Noise is removed and target edges are smoothed through morphological erosion and dilation operations. Then, the target boundary is located using the contour analysis method, and excessively large or small non-fragmented areas are removed by area filtering. After the method is fused with the detection results of the improved YOLOv5, false positives and false negatives in complex backgrounds can be effectively reduced.

[0040] IV. In the detection results, each fragment is assigned a unique identifier and a trajectory container is established, recording its position and index information in each frame. Matching of adjacent frames is achieved by calculating the Euclidean distance:

[0041] In the formula, Indicates the positional distance between two fragment regions in two adjacent video frames; and These represent the x and y coordinates of the center point of the fragment region to be matched in the current frame of the video image; and These represent the x and y coordinates of the center point of the tracked fragment region in the previous video frame, respectively; the cosine similarity of appearance features is combined to achieve optimal matching among multiple targets.

[0042] The tracking process combines cosine similarity based on appearance features to achieve optimal matching among multiple targets. Successfully matched targets update their trajectories, while unmatched targets are assigned new IDs, ensuring the continuity and stability of the tracking process.

[0043] V. If the trajectory information is complete, based on the displacement between two consecutive frames and time interval Calculate single frame rate:

[0044] The system then averages all valid velocities in the fragment's trajectory to obtain the average pixel velocity. For targets with trajectories shorter than two frames, velocity statistics are not performed. Finally, the system associates the average velocity of each fragment with its corresponding ID and outputs a velocity list, velocity distribution map, and fragment density variation curve, providing a visual representation of the results.

[0045] The experimental conditions were as follows: An Intel® i9-10920X 3.5GHz CPU, a 3090 24G GPU, 128GB of RAM, and a Windows 10 operating system were used. The experiment employed a program written in Python. The data used in the experiment consisted of 2800 training images and 700 test images collected during the actual explosion process. The training phase was set with a maximum of 3500 training epochs, an initial learning rate of 0.00001, and a batch size of 8.

[0046] Experimental verification: To demonstrate the effectiveness of the invented method, SET, UGS, FID-PGD, EV-SpSegNet, and SOE-YOLO were selected as comparison algorithms on this fragmented dataset.Specifically, SET was proposed in the paper "Sun H, Wang R, Li Y, et al. Set: Spectral enhancement for tiny object detection. Proceedings of the Computer Vision and Pattern Recognition Conference. pp.4713-4723.2025."; UGS was proposed in the paper "Sun H, Li Y, Yang L, et al. Uncertainty-Aware Gradient Stabilization for Small Object Detection. Proceedings of the IEEE / CVF International Conference on Computer Vision. pp. 8407-8417.2025."; FID-PGD was proposed in the paper "Bian J, Feng M, Dong W, et al. Feature Information DrivenPosition Gaussian Distribution Estimation for Tiny Object Detection. Proceedings of the Computer Vision and Pattern Recognition Conference. pp.30376-30386. 2025."; and EV-SpSegNet was proposed in the paper "Chen N, Xiao C, Dai Y, et al. Event-based tiny object detection: A benchmark dataset and baseline. Proceedings of the IEEE / CVF International Conference on Computer Vision.pp.7209-7218. 2025. SOE-YOLO was proposed in the paper "Zhang C, Gu C, Duan Q, et al. SOE-YOLO: A Small Object Enhancement Detection Network[J]. IEEE SensorsJournal, 2025." Experimental results are shown in the figure. Figure 5 As shown.

[0047] This invention uses root mean square error (RMSE) to calculate the velocity error of explosion fragments. Precision and recall metrics are used to measure the localization performance. Experiments were conducted on a test set, and the results are shown in Table 1.

[0048] Table 1

[0049] As shown in Table 1, the root mean square error of velocity of this invention on the fragment test set is 0.13 pixels / frame, with precision and recall rates of 84% and 87%, respectively, verifying the effectiveness of the invention. Overall, this invention can fully utilize deep learning methods and traditional methods for fragment detection and fragment velocity calculation, and can effectively support damage assessment.

[0050] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for dense fragment detection based on YOLO and morphological contour extraction, characterized in that, include: Obtain a sequence of consecutive video images containing the movement of explosion fragments from a real explosion test; Extract keyframe video images from a continuous sequence of video images, label the fragmented regions in the keyframe video images, and construct a real dataset based on the keyframe video images and their corresponding fragmented regions. The contours and textures corresponding to the debris in the keyframe video images of the real explosion experiment are extracted and synthesized into the background to automatically generate bounding box annotations. A virtual dataset is constructed based on each keyframe video image and the annotated debris bounding boxes. The real dataset and the virtual dataset are then fused into a fused dataset. An improved YOLOv5 model was constructed and trained on a fused dataset to obtain a trained target YOLOv5 model. The improved YOLOv5 model includes: a multi-branch dilated convolutional layer and a clustering analysis layer. The multi-branch dilated convolutional layer is used to extract multi-scale features. The clustering analysis layer is used to cluster the true bounding boxes of debris in the fused dataset, adaptively calculate and generate anchor boxes that fit the scale distribution of actual explosion debris. Each frame of the continuous video image sequence to be processed is input into the target YOLOv5 model, and the corresponding fragment region of each frame is output. A foreground mask is generated using a background modeling method, and noise is removed by morphological operations. The foreground contour is obtained using a contour extraction method, and non-fragmented regions are filtered by setting area conditions to obtain the target fragmented region. Obtain the intersection-union ratio (IUGR) of the target fragment region and the fragment region detected by the target YOLOv5 model. If the IUGR is greater than a set threshold, the fragment region detected by the target YOLOv5 model is taken as the final detection result; if the IUGR is less than or equal to the set threshold, the target fragment region is taken as the final detection result.

2. The dense fragment detection method based on YOLO and morphological contour extraction according to claim 1, characterized in that, The steps for extracting keyframe video images from a sequence of consecutive video images are as follows: Based on the amount of pixel grayscale change between adjacent video frames, pixels with a grayscale difference greater than 30 are marked as moving pixels, and the ratio of moving pixels to the total number of pixels in the full image is used as the area ratio. Video image frames with an area ratio greater than 5% are used as keyframe video images with intense motion.

3. The dense fragment detection method based on YOLO and morphological contour extraction according to claim 1, characterized in that, By extracting the contours and textures of fragments from each frame of a real explosion experiment video image through pixel-by-pixel segmentation, these contours and textures are synthesized into the background to automatically generate bounding box annotations, thus constructing a virtual dataset.

4. The dense fragment detection method based on YOLO and morphological contour extraction according to claim 1, characterized in that, The spatial pyramid pooling layer in the improved YOLOv5 model uses a dilated convolution pyramid layer.

5. The dense fragment detection method based on YOLO and morphological contour extraction according to claim 1, characterized in that, The clustering analysis layer uses the K-means clustering algorithm to cluster the true bounding boxes of debris in the fused dataset, adaptively calculates and generates anchor box parameters that fit the actual scale distribution of explosion debris, and replaces the original default anchor boxes of the model with the anchor box parameters.

6. The dense fragment detection method based on YOLO and morphological contour extraction according to claim 1, characterized in that, Data augmentation operations are performed during the training of the improved YOLOv5 model. These operations include random flipping, rotation, scaling, and color parameter adjustment.

7. A speed calculation method for dense fragment detection based on YOLO and morphological contour extraction, characterized in that, include: The time interval is calculated based on the camera's frame rate to obtain the single-frame speed; The average velocity of a fragment is obtained by averaging the velocity of all single frames in the motion trajectory of the same fragment in a continuous frame video image obtained by the dense fragment detection method based on YOLO and morphological contour extraction as described in any one of claims 1-6. The ratio between pixels and actual physical scale is calculated based on camera calibration parameters, and the pixel velocity is converted into the actual physical velocity of the fragment.

8. The speed calculation method for dense fragment detection based on YOLO and morphological contour extraction according to claim 7, characterized in that, Velocity statistics are not performed on targets whose motion trajectory length is less than two frames.

9. The speed calculation method for dense fragment detection based on YOLO and morphological contour extraction according to claim 7, characterized in that, The steps to obtain the motion trajectory of the same fragment in consecutive video frames are as follows: Obtain the position, bounding box, and frame index of the fragment region in each frame of a continuous frame video image obtained by the dense fragment detection method based on YOLO and morphological contour extraction as described in any one of claims 1-6. Based on the positional similarity between each pair of fragment regions in two adjacent video frames, the motion trajectory of each fragment in a continuous video image sequence is obtained.

10. A dense fragment detection system based on YOLO and morphological contour extraction, characterized in that, The detection system includes: The dataset construction module is used to acquire a sequence of continuous frame video images containing the movement of explosive fragments from a real explosion experiment; extract keyframe video images from the continuous frame video image sequence, annotate the fragment regions in the keyframe video images, and construct a real dataset based on the keyframe video images and their corresponding fragment regions; extract the contours and textures corresponding to the fragments in the keyframe video images of the real explosion experiment, and synthesize them into the background to automatically generate bounding box annotations; construct a virtual dataset based on each keyframe video image and the annotated fragment bounding boxes; and fuse the real dataset and the virtual dataset into a fused dataset. An improved YOLOv5 model construction and training module is used to build an improved YOLOv5 model and train it on a fused dataset to obtain a trained target YOLOv5 model. The improved YOLOv5 model includes: a multi-branch dilated convolutional layer and a clustering analysis layer. The multi-branch dilated convolutional layer is used to extract multi-scale features; the clustering analysis layer is used to cluster the true bounding boxes of debris in the fused dataset, adaptively calculate and generate anchor boxes that fit the scale distribution of actual explosion debris. The detection result output module is used to input each frame of the continuous video image sequence to be processed into the target YOLOv5 model and output the corresponding fragment region in each frame of the video image; generate a foreground mask using a background modeling method and remove noise through morphological operations; obtain the foreground contour using a contour extraction method, filter non-fragmented regions by setting area conditions, and obtain the target fragment region; obtain the intersection-union ratio (IU / U) of the target fragment region and the fragment region detected by the target YOLOv5 model; if the IU / U is greater than a set threshold, the fragment region detected by the target YOLOv5 model is taken as the final detection result; if the IU / U is less than or equal to the set threshold, the target fragment region is taken as the final detection result.