Air combat behavior modeling method based on fitting reinforcement learning

A modeling method and reinforcement learning technology, applied in the field of computer simulation, can solve problems such as long convergence process

Inactive Publication Date: 2015-04-01
BEIHANG UNIV
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, if the dimension of the state space is high, even if the number of basic units is limited, its number scale will ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Air combat behavior modeling method based on fitting reinforcement learning
  • Air combat behavior modeling method based on fitting reinforcement learning
  • Air combat behavior modeling method based on fitting reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0097] Step 1: Perform data sampling.

[0098] Step 101: Establish a combat simulation involving the red and blue aircraft. Among them, both warring parties adopt a traditional combat decision-making method, called the max-min method. Taking the red team as an example, the decision-making process of this method can be described as: choose an action that maximizes the immediate reward S(x) and minimizes the immediate reward of the blue team for any situation x. The blue side made the same decision. Sampling results such as Figure 5 shown.

[0099] Step 102: Record the combat trajectory generated by the combat simulation, and obtain a set of trajectory sampling points.

[0100] Step 2: Utility function fitting.

[0101] Step 201: Establish a feature set, as shown in Table 1.

[0102] Step 202: Perform utility function fitting, the process is as follows figure 2 shown.

[0103] Step 3: Make operational decisions.

[0104] Through the forward testing method, the final a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an air combat behavior modeling method based on fitting reinforcement learning, and realizes intelligent decision of tactical action in virtual air combat simulation. The method includes the steps: sampling the track of the aircraft combat process; fitting utility functions in a state space and approximately calculating the utility functions by Behrman iteration and least square fitting; making operational decisions, making action decisions by the aid of the fitted utility functions in the forecasting process, and determining finally executed action according to forecasted execution results. The fitting efficiency and the acquisition efficiency of the utility functions can be effectively improved, and optimal action strategies can be more rapidly acquired as compared with a traditional method.

Description

technical field [0001] The invention belongs to the technical field of computer simulation, and in particular relates to a method for realizing the intelligent decision-making task of air combat for aircraft. Background technique [0002] As the tasks undertaken by the UAV system in the modern battlefield become more and more complex, the requirements for the intelligent decision-making level of the UAV are also getting higher and higher; air combat decision-making is undoubtedly one of the most difficult tasks. [0003] Reinforcement learning technology is a method in which the learning object interacts with the behavioral environment through "trial and error", and obtains its own optimal action strategy through the accumulation of immediate rewards. However, in the traditional reinforcement learning process, in order to make the action strategy converge effectively, a common processing method is to discretize each dimension of the state space to obtain "limited" basic stat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/50
Inventor 马耀飞马小乐宋晓龚光红
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products