Small target detection data augmentation method based on reinforcement learning

By optimizing the data augmentation strategy for small target detection through a two-stage decoupling search method based on reinforcement learning, the problem of high complexity and rigidity in existing technologies is solved, and fast and efficient data augmentation is achieved, thereby improving the accuracy of small target detection.

CN122244591APending Publication Date: 2026-06-19SPECIAL EQUIP SAFETY SUPERVISION INSPECTION INST OF JIANGSU PROVINCE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SPECIAL EQUIP SAFETY SUPERVISION INSPECTION INST OF JIANGSU PROVINCE
Filing Date
2026-03-25
Publication Date
2026-06-19

Smart Images

  • Figure CN122244591A_ABST
    Figure CN122244591A_ABST
Patent Text Reader

Abstract

This invention provides a reinforcement learning-based data augmentation method for small target detection. It involves acquiring images of small targets and dividing them into training and validation sets based on image quantity and classification. Hyperparameters of the reinforcement learning controller are set to determine a set of potential image augmentation operations. A two-stage decoupled search is employed: the search space is decomposed into individual parameter search and policy combination search; the individual parameter search space is determined; the individual parameter search is performed; the policy combination search space is determined; within the policy combination search space, the reinforcement learning controller continues the search to obtain the optimal policy combination; and the optimal policy combination is used for data augmentation. This method significantly reduces search time, enabling reinforcement learning to be effectively used in data augmentation, enriching the dataset, and improving the accuracy of small target detection by selecting the optimal policy combination.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a reinforcement learning-based data augmentation method for small target detection, belonging to the field of target detection data augmentation technology. Background Technology

[0002] In the field of computer vision, object detection is a core technology for scenarios such as autonomous driving and remote sensing monitoring, and its performance is highly dependent on the scale and quality of the training data. Traditional data augmentation methods expand the dataset through geometric transformations (flipping, cropping) and photometric perturbations (brightness adjustment, noise injection). While this can improve model generalization, it has fundamental limitations: strategy design relies on prior experience, making it difficult to adapt to complex task characteristics; and it lacks targeted optimization mechanisms for challenging scenarios such as small targets and occlusion. With the development of automated data augmentation technology, this problem has shifted to a new dimension—how to balance search efficiency and augmentation effect has become a key bottleneck restricting the implementation of the technology. In the past, common data augmentation methods used for image recognition were manually designed, and the best augmentation strategies were dataset-specific. For example, on MNIST, most top-ranked models used elastic distortion, scaling, translation, and rotation. On natural image datasets, such as CIFAR-10 and ImageNet, random cropping, image mirroring, and color shifting / whitening are more common. Because these methods are manually designed, they require expertise and time. Methods that learn data augmentation strategies from data can, in principle, be used on any dataset, not just a single dataset.

[0003] AutoAugment, proposed in 2018, was the first to introduce reinforcement learning into augmentation policy search, achieving accuracy improvements on the CIFAR-10 dataset with a computational cost of 15,000 GPU hours. However, its "brute-force search" paradigm exposed two major drawbacks: 1) Curse of dimensionality: jointly optimizing the probability, strength, and order of 16 operations results in extremely high search space complexity, leading to an exponential increase in computational cost; 2) Policy rigidity: the static policy of offline search cannot respond to the dynamics of model training and is difficult to adapt to the needs of different convergence stages. Subsequent studies, although attempting evolutionary algorithms or gradient optimization, have still failed to reduce the search overhead to below 100 GPU hours. Especially in small object detection scenarios, the above methods encounter three challenges: 1) Feature degradation: random cropping may directly remove small objects occupying only tens of pixels; 2) Scale mismatch: fixed scaling policies cannot adapt to multi-scale object distributions; 3) Semantic distortion: generative augmentation easily destroys the semantic consistency between the target and the background. Summary of the Invention

[0004] The purpose of this invention is to provide a reinforcement learning-based data augmentation method for small target detection to address the problems in existing technologies, such as extremely high search space complexity, exponentially increasing computational costs, long search times, and rigid strategies.

[0005] The technical solution of this invention is: A reinforcement learning-based data augmentation method for small object detection includes the following steps: Step 1: Obtain small target images, divide them into training and validation sets according to the number of images and classification, and use the training set to train the target detection model; Step 2: Set the hyperparameters of the reinforcement learning controller and determine the set of image augmentation operations to be selected; Step 3: Adopt a two-stage decoupled search: decompose the search space and divide the search operation into individual parameter search and strategy combination search; Step 4: Determine the individual parameter search space: Define the parameter list for each image enhancement operation in the set of candidate image enhancement operations; Step 5: Perform individual parameter search: In the individual parameter search space, use the reinforcement learning controller to search the parameter space of each data augmentation operation individually, and use the detection results of the target detection model on the validation set as the evaluation index to obtain the optimal parameters of each image augmentation operation. Step 6: Determine the strategy combination search space: Based on the optimal parameters of each image enhancement operation obtained in Step 6, arrange the image enhancement operations in a set order, and set the state S for each image enhancement operation; Step 7: In the policy combination search space, continue to use the reinforcement learning controller to search, and use the detection results of the target detection model on the validation set as the evaluation index to obtain the optimal policy combination; Step 8: Use the optimal strategy combination obtained in Step 7 to perform data augmentation on the image dataset.

[0006] Furthermore, in step 1, the target detection model uses the YOLOv8 network.

[0007] Furthermore, in step 2, the set of image enhancement operations to be selected includes two or more image enhancement operations such as magnification, copying, adding noise, highlighting, flipping, image contrast enhancement, and cropping.

[0008] Furthermore, in step 6, when state S = 0, it indicates that the image enhancement operation is deprecated, and when state S = 1, it indicates that the image enhancement operation is used.

[0009] The beneficial effects of this invention are: I. This reinforcement learning-based data augmentation method for small target detection uses a two-stage decoupled search to replace the original one-step search strategy. Compared with existing technologies, this invention greatly reduces the search time, enabling reinforcement learning to be effectively used in the field of data augmentation and effectively enriching the dataset. The optimal strategy combination obtained by this method can effectively improve the accuracy of small target detection.

[0010] II. This reinforcement learning-based data augmentation method for small target detection addresses the efficiency bottleneck often encountered by traditional reinforcement learning applications in automatic data augmentation due to the large and complex search space. This invention effectively solves this problem through a carefully designed two-stage decoupled search, significantly simplifying the exploration dimensions of reinforcement learning. Based on this simplified search space, this invention successfully realizes the effective application of reinforcement learning in the field of target detection data augmentation. The method of this invention has advantages such as fast search speed, effective improvement of detection accuracy, and effective enrichment of the dataset, and is particularly suitable for improving the detection accuracy of small targets in resource-constrained scenarios. Attached Figure Description

[0011] Figure 1 This is an illustrative diagram illustrating the reinforcement learning-based data augmentation method for small target detection according to an embodiment of the present invention; Figure 2 This is a schematic diagram illustrating the two-stage decoupling search employed in the embodiment. Detailed Implementation

[0012] The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0013] The embodiment provides a reinforcement learning-based data augmentation method for small target detection, such as... Figure 1 This includes the following steps: Step 1: Obtain small target images, divide them into training and validation sets according to the number of images and classification, and use the training set to train the target detection model.

[0014] In Step 1, the object detection model uses a YOLOv8 network. This model is used to evaluate the performance of the data augmentation parameters and augmentation combination strategies selected in subsequent steps. The validation set serves as the search dataset for the data augmentation parameters and augmentation combination strategies in subsequent steps. Specifically, the reinforcement learning controller searches for the optimal data augmentation parameters and strategies on this dataset based on the evaluation metrics provided by the target model.

[0015] Step 2: Set the hyperparameters of the reinforcement learning controller and determine the set of image augmentation operations to be selected.

[0016] In step 2, the reinforcement learning controller is composed of a recurrent neural network (or other network structures such as a convolutional neural network). Hyperparameters include the number of network layers, the number of neurons, the activation function, and the optimizer. The design goal of this controller is to autoregressively generate a policy sequence. The input to the reinforcement learning controller is the state to be updated for the image enhancement operation, and the output is the updated state of the image enhancement operation (composed of the image enhancement operation type and intensity). The reinforcement learning controller is trained using a reward signal (i.e., the evaluation metric provided by the object detection model). In this embodiment, a validation set is used to measure the generalization ability of the sub-model. The sub-model is trained on a training set (excluding the validation set). The sub-model is then evaluated on the validation set to measure accuracy, which is used as the reward signal for training the recurrent network controller.

[0017] The set of image enhancement operations to be selected includes two or more of the following: magnification, duplication, noise addition, highlighting, flipping, image contrast enhancement, and cropping. Specifically: Magnification: The image is magnified by a certain factor centered on the detection box, ensuring that the detection box remains in the image after enhancement. Magnification can make small targets appear slightly larger while preserving surrounding information. Duplication: The detection box is copied multiple times to another location in the image, allowing the model to learn target features multiple times, thus addressing the scarcity of small target samples. Noise Addition: Additional noise is added to the detection box; here, we use pepper noise. Pepper noise (randomly occurring black and white pixels) effectively simulates pixel anomalies caused by camera sensor malfunctions, transmission interference, or low-light environments, enhancing the model's tolerance to hardware defects. Highlighting: The brightness of pixels within the highlighted areas is increased. Since the dataset used is black and white detection images, targeted enhancement of the brightness and contrast of the highlighted areas can significantly improve target detection performance.

[0018] Step 3: Employ a two-stage decoupled search, such as... Figure 2 : Decompose the search space and divide the search operation into single parameter search and strategy combination search.

[0019] In step 3, since there are multiple data augmentation operations in the set of candidate image augmentation operations and the parameters of each method need to be optimally searched, searching simultaneously would lead to a huge search space (e.g., AutoAugment requires 15,000 GPU hours of computing power) and make the search unusable. The embodiment divides the search operation into individual parameter search and strategy combination search, decomposing the high-dimensional search space into multiple low-dimensional subproblems, avoiding the risk of combinatorial explosion of multi-operation joint parameter tuning, and greatly reducing the randomness of the initial exploration.

[0020] Step 4: Determine the individual parameter search space: Set the parameter list for each image enhancement operation in the set of candidate image enhancement operations.

[0021] In step 4, a parameter list for each image enhancement operation is set. Each image enhancement operation has a fixed number of parameters, such as five. Taking the magnification operation as an example, the parameters P1, P2, P3, P4, and P5 are set to represent the magnification of the five images respectively. These parameters increase uniformly from 1.2 to 1.6, i.e., Pi ∈ U(1.2, 1.6).

[0022] Step 5: Perform individual parameter search: In the individual parameter search space, use the reinforcement learning controller to search the parameter space of each data augmentation operation separately, and use the detection results of the target detection model on the validation set as the evaluation index to obtain the optimal parameters of each image augmentation operation.

[0023] In step 5, the controller performs a separate parameter space search for each data augmentation operation. If six data augmentation methods are used, six separate searches are required to ensure that each data augmentation operation is the optimal data augmentation method for the target dataset, facilitating subsequent strategy combination searches. For example, for image magnification, several different magnification levels are set, and the reinforcement learning controller selects the optimal magnification level for the validation set based on evaluation metrics. This stage only considers the optimal solution for a single operation on the dataset, effectively reducing search space complexity.

[0024] Step 6: Determine the strategy combination search space: Based on the optimal parameters of each image enhancement operation obtained in Step 6, arrange the image enhancement operations in a set order, and set the state S for each image enhancement operation; In step 6, when state S = 0, it means that the image enhancement operation is deprecated, and when state S = 1, it means that the image enhancement operation is used.

[0025] Step 7: In the policy combination search space, continue to use the reinforcement learning controller to search, and use the detection results of the object detection model on the validation set as the evaluation index to obtain the optimal policy combination.

[0026] In step 7, the reinforcement learning controller combines image enhancement operations in various ways according to the evaluation metrics of the object detection model until the reinforcement learning controller finishes iteration and outputs the optimal policy combination. This process can effectively reduce the search space complexity because it only considers policy combinations and does not consider the optimal solution of a single operation for the dataset.

[0027] Step 8: Use the optimal strategy combination obtained in Step 7 to perform data augmentation on the image dataset.

[0028] This reinforcement learning-based data augmentation method for small target detection uses a two-stage decoupled search to replace the original one-step search strategy. Compared with existing technologies, this invention greatly reduces the search time, enabling reinforcement learning to be effectively used in the field of data augmentation and effectively enriching the dataset. The optimal strategy combination selected by this method can effectively improve the accuracy of small target detection.

[0029] This reinforcement learning-based data augmentation method for small target detection addresses the efficiency bottleneck often encountered in traditional reinforcement learning applications for automatic data augmentation due to the large and complex search space. This invention effectively solves this problem through a carefully designed two-stage decoupled search, significantly simplifying the exploration dimensions of reinforcement learning. Based on this simplified search space, this invention successfully realizes the effective application of reinforcement learning in the field of target detection data augmentation. The method of this invention has advantages such as fast search speed, effective improvement of detection accuracy, and effective enrichment of the dataset, and is particularly suitable for improving the detection accuracy of small targets in resource-constrained scenarios.

[0030] This reinforcement learning-based data augmentation method for small target detection employs an image enhancement set that includes both image-level and bounding box-level enhancements. Image-level enhancements include zooming in and out of the entire image, directly addressing the core challenges of small targets—their small size, low resolution, and weak feature information in the original image. Intelligent zooming significantly improves the visual recognizability and spatial proportion of small targets, making key features easier for the detection network to capture and learn. Intelligent zooming helps simulate the morphology of targets at different observation distances or scales, effectively enhancing the model's robustness to scale changes. For bounding box-level enhancements, color and geometric operations searching for objects in the image include adding noise, brightening the target, and duplication. For example, brightening the target region, adjusting contrast, color shifting, and adding specific pattern noise aim to simulate the impact of complex and varied real-world lighting conditions and sensor noise on the appearance of small targets.

[0031] This reinforcement learning-based data augmentation method for small object detection introduces reinforcement learning into the search for data augmentation strategies, demonstrating significant advantages over traditional manual design or random combination methods, especially in complex tasks such as object detection. The core of reinforcement lies in the agent learning optimal decisions through interaction with the environment. In the data augmentation scenario, the "environment" refers to the model's performance feedback on the validation set (e.g., mAP). The reinforcement learning controller learns and optimizes its strategy by continuously trying different combinations of augmentation operations (actions) and based on the improvement in model performance (rewards). This fully automates the process of finding the best augmentation strategy, avoiding tedious and inefficient manual trial and error, and intelligently exploring a vast operational space to discover efficient combinations that are difficult for humans to intuitively design.

[0032] This reinforcement learning-based data augmentation method for small object detection is an efficient two-stage reinforcement learning search strategy that significantly reduces the search complexity and resource consumption of data augmentation strategies. The core innovation lies in decoupling the search process into two hierarchical stages, achieving a progressive exploration from local optima to global optima: (a) Single-operation parameter fine-tuning: In this stage, parameters are independently optimized for each candidate augmentation operation (e.g., scaling, noise addition, brightness adjustment). The reinforcement learning agent focuses on learning the optimal intensity, probability, or range of hyperparameters (e.g., scaling threshold, noise intensity distribution) of the single operation through interaction with the environment (i.e., the accuracy feedback of the downstream object detection model on the validation set). This method decomposes the high-dimensional search space into multiple low-dimensional sub-problems, avoiding the combinatorial explosion risk of multi-operation joint parameter tuning and significantly reducing the randomness of the initial exploration. (b) Operation combination search with fixed parameters: Based on the results of the first stage, the optimal parameters of each augmentation operation are fixed and treated as "atomic operations." The agent then focuses on learning the combination strategy of these atomic operations, including operation selection, execution order, superposition logic, and application probability. Advantages: The search space is simplified to a discrete combination of operation decisions (rather than a continuous-discrete hybrid space), which greatly reduces the difficulty of exploration for the agent while preserving the performance potential of the operations.

[0033] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit the scope of protection of the invention. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on these embodiments, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can still combine, add, delete, or otherwise adjust the features of the various embodiments of the present invention according to the circumstances without conflict or creative effort, thereby obtaining different technical solutions that do not fundamentally depart from the concept of the present invention. These technical solutions also fall within the scope of protection of the present invention.

Claims

1. A data augmentation method for small target detection based on reinforcement learning, characterized in that: Includes the following steps, Step 1: Obtain small target images, divide them into training and validation sets according to the number of images and classification, and use the training set to train the target detection model; Step 2: Set the hyperparameters of the reinforcement learning controller and determine the set of image augmentation operations to be selected; Step 3: Adopt a two-stage decoupled search: decompose the search space and divide the search operation into individual parameter search and strategy combination search; Step 4: Determine the individual parameter search space: Define the parameter list for each image enhancement operation in the set of candidate image enhancement operations; Step 5: Perform individual parameter search: In the individual parameter search space, use the reinforcement learning controller to search the parameter space of each data augmentation operation individually, and use the detection results of the target detection model on the validation set as the evaluation index to obtain the optimal parameters of each image augmentation operation. Step 6: Determine the strategy combination search space: Based on the optimal parameters of each image enhancement operation obtained in Step 5, arrange the image enhancement operations in a set order, and set the state S for each image enhancement operation; Step 7: In the policy combination search space, continue to use the reinforcement learning controller to search, and use the detection results of the target detection model on the validation set as the evaluation index to obtain the optimal policy combination; Step 8: Use the optimal strategy combination obtained in Step 7 to perform data augmentation on the image dataset.

2. The reinforcement learning-based data augmentation method for small target detection as described in claim 1, characterized in that: In step 1, the target detection model uses the YOLOv8 network.

3. The reinforcement learning-based data augmentation method for small target detection as described in claim 1 or 2, characterized in that: In step 2, the set of image enhancement operations to be selected includes two or more image enhancement operations such as magnification, copying, adding noise, highlighting, flipping, image contrast enhancement, and cropping.

4. The reinforcement learning-based data augmentation method for small target detection as described in claim 1 or 2, characterized in that: In step 6, when state S = 0, it means that the image enhancement operation is deprecated, and when state S = 1, it means that the image enhancement operation is used.