An infrared small target detection method based on reinforcement learning
By combining reinforcement learning and convolutional neural networks, and using the DDQN network for infrared small target detection, the shortcomings of deep learning in control decision-making are addressed, achieving efficient infrared small target detection and demonstrating the applicability of reinforcement learning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF ELECTRONICS SCI & TECH OF CHINA
- Filing Date
- 2024-07-10
- Publication Date
- 2026-06-30
AI Technical Summary
While deep learning methods have strong perception capabilities in infrared small target detection, they are powerless in the control decision-making problem under the scene, making it difficult to achieve high-precision and real-time detection.
Combining reinforcement learning and convolutional neural networks, an DDQN network is used for end-to-end infrared small target detection. By designing action space, reward function and hyperparameters to train the agent, the localization and detection of infrared small targets are achieved.
It achieves efficient detection of small infrared targets in complex scenarios, solves the problems of limited samples and low training accuracy, and demonstrates the applicability of reinforcement learning in small infrared target detection.
Smart Images

Figure CN118781327B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a method for detecting small infrared targets based on reinforcement learning, belonging to the field of infrared small target detection using convolutional neural network models. Background Technology
[0002] Infrared small target detection has always been a key technology in infrared search and track systems, widely used in precision guidance, anti-missile interception, and military early warning. Due to the limitations of infrared imaging, threat targets often appear as small targets lacking shape and texture features in the field of view due to their excessive detection distance. Furthermore, the background clutter from diverse scenes further complicates matters. Designing an algorithm that combines high detection accuracy, robustness to various scenarios, and good real-time performance is an extremely challenging task. Currently, in the field of infrared small target detection, detection technology has evolved from traditional manual feature extraction algorithms to machine learning and deep learning. Deep learning methods have a strong ability to perceive the semantic features of the target, but they are ineffective for control decision-making problems in general scenarios. This method applies reinforcement learning algorithms to small target detection in infrared scenes, leveraging the powerful decision-making capabilities of reinforcement learning for end-to-end infrared small target detection. With the rapid development of reinforcement learning, reinforcement learning-based target detection methods have been gradually proposed. Their advantage lies in the fact that they do not require a large amount of labeled data; training can be performed by specifying only the target task and reward function, thereby enabling perception and decision-making in complex systems.
[0003] In object detection methods, deep learning-based object detection techniques have achieved remarkable results, demonstrating impressive performance in many classification and recognition tasks. However, while deep learning methods excel at perceiving object features, they are powerless to handle control and decision-making problems in certain scenarios. Reinforcement learning, another research hotspot in machine learning, is widely applied in game simulation, machine control, intelligent recommendation, and natural language processing. Its core concept is to maximize the cumulative reward value obtained from the environment through continuous interaction between the agent and the environment, ultimately learning a better strategy to achieve the goal. Therefore, reinforcement learning methods are adept at learning certain strategies for solving problems; its learning process is similar to the human thought process when facing complex problems, and it is considered an important path towards general artificial intelligence. Combining deep learning and reinforcement learning techniques can provide an effective technical approach for the perception and decision-making problems of complex systems. Deep reinforcement learning methods have thus emerged, which can combine the perception capabilities of deep learning with the decision-making capabilities of reinforcement learning in a general form, and can achieve direct control from the original high-dimensional input to the low-dimensional output through end-to-end learning. Against this backdrop, using deep reinforcement learning methods to solve the localization and recognition problems in the field of infrared small target detection has significant research value. Summary of the Invention
[0004] The purpose of this invention is to provide an infrared small target detection method based on reinforcement learning, which solves the problem that while deep learning methods have a strong ability to perceive target features, they are powerless to deal with control decision problems in some scenarios.
[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0006] A reinforcement learning-based method for detecting small infrared targets includes the following steps:
[0007] Step 1: Obtain the open-source infrared small target dataset and annotation information.
[0008] Step 2: Label the existing images with small targets.
[0009] Step 2.1: Based on the open-source infrared small target dataset, use the online annotation tool MakeSense to annotate the infrared small targets in the image;
[0010] Step 2.2: All small infrared targets are labeled as "Target," and this is the only category in the entire dataset. After labeling, the label file is output as an XML file, and the image file names correspond one-to-one.
[0011] Step 3: Input the image into the pre-trained convolutional neural network. Here, ResNet50 is used as the feature extraction network to input the state values required by the agent.
[0012] Step 3.1: The input region is first resized to a specific size and processed by a pre-trained convolutional neural network. Instead of learning the full feature hierarchy of the convolutional neural network, the weights of a pre-trained ResNet50 model are utilized.
[0013] Step 3.2: Using pre-trained weights can speed up the update of the parameters of the DDQN network, while ResNet50 is only used as a feedforward feature extractor. On the other hand, the pre-trained weights are trained using a larger dataset, which is more conducive to feature extraction.
[0014] Step 3.3: The output of ResNet50 is concatenated with the action history vector, and the output is a representation of the state. This is then input into the next stage.
[0015] Step 4: The state values obtained from ResNet50 are used as the input values of the DDQN network. The network is trained using two networks: the behavior network and the policy network. The output is each action in the action space. The agent selects the action with the highest action value function.
[0016] In DQN networks, the Q-value is updated by selecting the action corresponding to the maximum Q-value. The target Q-value is calculated using the same network to select and evaluate actions, which may lead to an overestimation of the Q-value. The formula for updating the Q-value is:
[0017] Q(s,a)←Q(s,a)+α[r+γmax a′ Q(s′,a′)-Q(s,a)]
[0018] in:
[0019] Q(s, a) represents the expected reward obtained by choosing action a in state s;
[0020] α represents the learning rate, which controls the degree to which Q is updated;
[0021] r represents the immediate reward obtained from performing action a;
[0022] γ represents the discount factor, which indicates the decay rate of future rewards. It takes a value between 0 and 1. The larger the γ, the more important the future rewards are.
[0023] Step 4.1: Employ the DDQN network. DDQN uses two distinct networks to separately select actions and evaluate their value: a Behavior Network for action selection and a Target Network for evaluating the Q-value of the selected action. The Q-value of the DDQN network is updated as follows:
[0024] Q(s t a t )←Q(s t a t )+α[r t +γQ(s t+1 argmax a′ Q(s t+1 ,a′;θ);θ - -Q(s t ,a t )]
[0025] The parameter that differs from that of the DQN network is: θ represents the parameters of the current network, θ - These represent the parameters of the target network.
[0026] Step 4.2: The architecture of the Behavior Network is the same as that of the Policy Network (Target Network), but its weights remain unchanged for a period of time. Every certain number of steps, the weights of the Behavior Network are copied to the Policy Network (Target Network).
[0027] Step 5: By setting the action space, reward function, and hyperparameters, the network is trained. Finally, after analyzing 11 to 25 regions in the image, the agent can locate a single small infrared target object and detect the target without using region candidate boxes for target localization.
[0028] Step 5.1: Design eight actions in the action space. The formula for the conversion accuracy of the bounding box is:
[0029] α w =α*(x2-x1) α h =α*(y2-y1)
[0030] The coordinates of the top-left corner of the bounding box are (x1, y1), and the coordinates of the bottom-right corner are (x2, y2). The hyperparameter α controls the magnitude of the bounding box transformation.
[0031] Step 5.2: Design the reward function for the agent. This includes the reward function during the iteration process and the termination condition reward function. The specific reward function formula is as follows:
[0032] Termination condition reward function:
[0033]
[0034] Reward function during iteration:
[0035]
[0036] Where δ represents the reward size for the termination condition, g represents the bounding box, b represents the current prediction box, and b′ represents the next prediction box.
[0037] Step 5.3: The training strategy followed is ρ-greedy, gradually shifting from exploration to development based on the value of ρ. During exploration, the agent randomly selects actions, observes different transitions, and collects different sets of experiences. During development, the agent greedily selects actions according to the learned strategy and learns from its successes and mistakes.
[0038] In summary, due to the adoption of the above technical solution, the beneficial effects of the present invention are:
[0039] (1) Infrared small target detection is performed end-to-end using the sequential decision-making process of reinforcement learning. It is mainly used to search for target regions, replacing the classic detector, and to find regions of interest (ROI) in high-resolution infrared images. The reinforcement learning algorithm is combined with the convolutional neural network to form the DDQN network, which is used for region selection and bounding box refinement, and finally realizes the detection of infrared small targets.
[0040] (2) A reinforcement learning-based infrared small target detection method provides a new technical approach for the current mainstream infrared small target detection methods that rely on deep learning feature extraction. Based on the reinforcement learning method and combined with the application scenario of infrared small targets, the reinforcement learning algorithm framework solves the problems of insufficient samples and low training accuracy in the infrared small target detection process, and verifies the applicability of reinforcement learning in completing tasks in the field of infrared small target detection. Attached Figure Description
[0041] The present invention will be described by way of example and with reference to the accompanying drawings, wherein:
[0042] Figure 1 This is a flowchart of an infrared small target detection method based on reinforcement learning;
[0043] Figure 2 This is a detection framework diagram for small infrared targets.
[0044] Figure 3 It is a visualization of the network model, which includes the feature extraction part of the ResNet50 backbone network and the DDQN reinforcement learning network;
[0045] Figure 4 These are eight different actions defined in the agent's action space;
[0046] Figure 5 It is a graph showing the results of the agent's iterative search for the target, from the initial state to the final state. Detailed Implementation
[0047] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0048] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.
[0049] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0050] Example:
[0051] Figure 1 and Figure 2The diagrams show a flowchart of a reinforcement learning-based infrared small target detection method and a detection framework diagram for infrared small target scenarios. The overall detection steps are as follows:
[0052] Step 1: Obtain the open-source infrared small target dataset and annotation information;
[0053] Step 2: Label the existing images with small targets;
[0054] Step 3: Input the image into the pre-trained convolutional neural network. Here, ResNet50 is used as the feature extraction network to input the state values required by the agent.
[0055] Step 4: The state values obtained from ResNet50 are used as the input values of the DDQN network. The network is trained using two networks: the behavior network and the policy network. The output is each action in the action space. The agent selects the action with the highest action value function.
[0056] Step 5: By setting the action space, reward function, and hyperparameters, the network is trained. Finally, after analyzing 11 to 25 regions in the image, the agent can locate a single small infrared target object and detect the target without using region candidate boxes for target localization.
[0057] Figure 3 This is a visualization of the network model, which includes the feature extraction part of the ResNet50 backbone network and the DDQN reinforcement learning network.
[0058] For the feature extraction part of the ResNet50 backbone network, the specific steps are as follows:
[0059] Step 3.1: The input region is first resized to a specific size and processed by a pre-trained convolutional neural network. Instead of learning the full feature hierarchy of the convolutional neural network, the weights of a pre-trained ResNet50 model are utilized.
[0060] Step 3.2: Using pre-trained weights can speed up the update of the parameters of the DDQN network, while ResNet50 is only used as a feedforward feature extractor. On the other hand, the pre-trained weights are trained using a larger dataset, which is more conducive to feature extraction.
[0061] Step 3.3: The output of ResNet50 is concatenated with the action history vector, and the output is a representation of the state. This is then input into the next stage.
[0062] For the DDQN reinforcement learning network part, the specific steps are as follows:
[0063] Step 4.1: In DQN, the Q-value is updated by selecting the action corresponding to the maximum Q-value. The target Q-value is calculated using the same network to select and evaluate actions, which may lead to an overestimation of the Q-value. The formula for updating the Q-value is:
[0064] Q(s,a)←Q(s,a)+α[r+γmax a′ Q(s′,a′)-Q(s,a)]
[0065] in:
[0066] Q(s, a) represents the expected reward obtained by choosing action a in state s;
[0067] α represents the learning rate, which controls the degree to which Q is updated;
[0068] r represents the immediate reward obtained from performing action a;
[0069] γ represents the discount factor, which indicates the decay rate of future rewards. It takes a value between 0 and 1. The larger the γ, the more important the future rewards are.
[0070] Step 4.2: Employ the DDQN network. DDQN uses two distinct networks to separately select actions and evaluate their value: a Behavior Network for action selection and a Target Network for evaluating the Q-value of the selected action. The Q-value update for the DDQN network is as follows:
[0071] Q(s t ,a t )←Q(s t a t )+α[r t +γQ(s t+1 argmax a′ Q(s t+1 ,a′;θ);θ - -Q(a t a t )]
[0072] The parameter that differs from that of the DQN network is: θ represents the parameters of the current network, θ - These represent the parameters of the target network.
[0073] Step 4.3: The architecture of the Behavior Network is the same as that of the Policy Network (Target Network), but its weights remain unchanged for a period of time. Every certain number of steps, the weights of the Behavior Network are copied to the Policy Network (Target Network).
[0074] Figure 4 Eight different actions are defined for the agent's action space. The specific definition steps are as follows:
[0075] Step 5.1: Design eight actions in the action space. The formula for the conversion accuracy of the bounding box is:
[0076] α w =α*(x2-x1) α h =α*(y2-y1)
[0077] The coordinates of the top-left corner of the bounding box are (x1, y1), and the coordinates of the bottom-right corner are (x2, y2). The hyperparameter α controls the magnitude of the bounding box transformation.
[0078] Figure 5 The graph represents the result of the agent's iterative search for the target, from the initial state to the final state. The definition of the reward function and the steps of the training strategy are as follows:
[0079] Step 5.2: Design the reward function for the agent. This includes the reward function during the iteration process and the termination condition reward function. The specific reward function formula is as follows:
[0080] Termination condition reward function:
[0081]
[0082] Reward function during iteration:
[0083]
[0084] Where δ represents the reward size for the termination condition, g represents the bounding box, b represents the current prediction box, and b′ represents the next prediction box.
[0085] Step 5.3: The training strategy followed is ρ-greedy, gradually shifting from exploration to development based on the value of ρ. During exploration, the agent randomly selects actions, observes different transitions, and collects different sets of experiences. During development, the agent greedily selects actions according to the learned strategy and learns from its successes and mistakes.
[0086] This invention innovatively introduces reinforcement learning model framework and algorithm into infrared small target detection. From dataset annotation and backbone network feature extraction to reinforcement learning DDQN network training agent, it fully combines the characteristics of infrared small target detection and the advantages of reinforcement learning sequential decision process, and efficiently searches and locates weak targets in infrared images.
[0087] Finally, it should be noted that the above embodiments are merely preferred embodiments of the present invention, used to illustrate the technical solutions of the present invention, and are not intended to limit it, much less limit the patent scope of the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention; that is to say, any changes or refinements made to the main design idea and spirit of the present invention that have no substantial meaning, but whose technical problems are still consistent with the present invention, should be included within the protection scope of the present invention; in addition, the direct or indirect application of the technical solutions of the present invention to other related technical fields are similarly included within the patent protection scope of the present invention.
Claims
1. A method for detecting small infrared targets based on reinforcement learning, characterized in that: The main steps include: Step 1: Obtain the open-source infrared small target dataset and annotation information; Step 2: Label the existing images with small targets; Step 3: Input the image into the pre-trained convolutional neural network, which is a ResNet50. The ResNet50 is used as a feature extraction network to extract the state values required by the agent. Step 4: The state values obtained from ResNet50 are used as input values for the DDQN network. The DDQN network includes an action network and a policy network. The action network and policy network are used for training, and the output is each action in the action space. The agent selects the action with the highest action value function. Step 5: Train the DDQN network by setting the action space, reward function, and hyperparameters. Finally, after analyzing 11 to 25 regions in the image, the agent can locate a single small infrared target object and detect the target without using region candidate boxes for target localization. The specific steps for step 5 are as follows: Step 5.1: Design eight actions in the action space, where the bounding box transformation accuracy formula is: ; The formula for the coordinates of the top left corner of the bounding box is: The coordinate formula for the lower right corner is: hyperparameters Control the size of the bounding box transformation; Step 5.2: Design the reward function for the agent, including the reward function during the iteration process and the reward function for the termination condition; The termination condition reward function is expressed as: ; The reward function during the iteration process is expressed as: ; in This indicates the size of the reward for the termination condition. Indicates a label box. Indicates the current prediction box. Indicates the next prediction box; Step 5.3: The strategy followed during training is... ,according to The value gradually shifts from exploration to development. In the exploration process, the agent randomly selects actions, observes different transitions, and collects different sets of experience. In the development process, the agent greedily selects actions based on the learned strategy and learns from its own successes and mistakes.
2. The infrared small target detection method based on reinforcement learning according to claim 1, characterized in that, The specific steps for step 2 are as follows: Step 2.1: Based on the open-source infrared small target dataset, use the online annotation tool MakeSense to annotate the infrared small targets in the image; Step 2.2: All infrared small targets are labeled as "Target". There is only this category in the entire dataset. After the labeling is completed, the label file is output as an XML format file, and the image file names correspond one by one.
3. The infrared small target detection method based on reinforcement learning according to claim 1, characterized in that, The specific steps for step 3 are as follows: Step 3.1: The input region is first resized to a specific size and then processed by a pre-trained convolutional neural network; Step 3.2: The output of the convolutional neural network is concatenated with the action history vector, and the output is a representation of the state.
4. The infrared small target detection method based on reinforcement learning according to claim 1, characterized in that: The specific steps for step 4 are as follows: Step 4.1: The DDQN network uses two different networks to separate action selection and action evaluation: a behavior network for action selection and a policy network for evaluating the Q-value of the selected action; the Q-value update of the DDQN network is as follows: ; middle Indicates the state Select action The expected return that can be obtained; Represents the learning rate, controlling The degree of updating; Indicates the execution of an action Instant rewards received This represents the discount factor, indicating the rate of decay of future rewards, and its value is between 0 and 1. The larger the value, the more important the future reward. This represents the parameters of the current network. The parameters representing the policy network; Step 4.2: The architecture of the behavior network is the same as that of the policy network, but its weights remain unchanged for a period of time. Every certain number of steps, the weights of the behavior network are copied to the policy network.