Unlock instant, AI-driven research and patent intelligence for your innovation.

Planetary soft landing control method and system based on reinforcement learning, and storage medium

A technology of reinforcement learning and control methods, applied in the field of deep space exploration, can solve the problems of difficult convergence of training, complex models, and inability to guarantee the guidance law, etc., and achieves the effects of strong real-time performance, optimized control performance, and damage avoidance.

Active Publication Date: 2021-12-21
HARBIN INST OF TECH
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the problems of the existing planetary soft landing control that cannot guarantee the optimal guidance law, the model is more complex, and the training is difficult to converge, in order to achieve a more autonomous planetary soft landing, a planetary soft landing based on reinforcement learning is proposed. Control method and system and storage medium

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Planetary soft landing control method and system based on reinforcement learning, and storage medium
  • Planetary soft landing control method and system based on reinforcement learning, and storage medium
  • Planetary soft landing control method and system based on reinforcement learning, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0118] 1) Experimental environment settings

[0119] In this example, DDPG, TD3 and SAC are selected for training and testing, and the value function network of the three algorithms is designed as figure 1 , the strategic network structure of DDPG and TD3 is as follows figure 2 , SAC policy network structure such as image 3 . Using python programming, build a neural network based on pytorch for training.

[0120] The simulation test software environment of all algorithms in this paper is Ubuntu16.04, and the hardware environment is Intel(R) Core(TM) i5-9300H CPU+NVIDIAGEFORCE GTX 1660Ti+16.0GB RAM.

[0121] 2) Experimental results and analysis

[0122] The training process curves of DDPG, TD3 and SAC algorithms are as follows: Figure 4 , Figure 5 and Figure 6 . Among them, the DDPG round reward starts to rise at 10,000 rounds, and the increase is obvious from 10,000 to 20,000, and then gradually and steadily converges to about 300, and the training ends after abou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a planet soft landing control method and system based on reinforcement learning and a storage medium, relates to the field of soft landing trajectory optimization and control, and aims to solve problems that existing planet soft landing control cannot ensure optimal guidance law, a model is relatively complex, training is difficult to converge and the like. The method comprises the following steps of 1, establishing a six-degree-of-freedom dynamic model of the power descending section of a lander based on the characteristics of hardware configuration, engine power configuration and the like of the lander; 2, designing a reward function, an observation space, an action space and a neural network structure of a training interaction environment; 3, establishing a numerical simulation environment, and training by utilizing a reinforcement learning algorithm to obtain a soft landing controller; and 4, evaluating a training control effect through a speed tracking test and a soft landing test. The soft landing reinforcement learning environment model is obtained by executing the first step and the second step, the intelligent agent interacts with the environment model, and therefore training data are obtained. The soft landing controller can be obtained through training in the third step, and finally the training result with the best performance is selected through the fourth step to serve as the optimal soft landing controller. The method is used for soft landing trajectory optimization and control.

Description

technical field [0001] The invention relates to a control method for planetary soft landing based on reinforcement learning, which belongs to the technical field of soft landing trajectory optimization and control and the technical field of deep space exploration. Background technique [0002] Reinforcement learning algorithm is a kind of machine learning algorithm, in which the agent (agent) learns in a "trial and error" way, and evaluates the quality of behavior by rewards obtained by interacting with the environment, and the goal is to make the agent obtain the greatest reward . Generally, it can be divided into two categories: model-based and non-model-based. [0003] Patent document CN110466805B discloses an asteroid landing guidance method based on optimized guidance parameters, which establishes the dynamic equation of the probe in the coordinate system of the landing point; analyzes the movement of the probe in the three directions of the coordinate system of the la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G05D1/10
CPCG05D1/101Y02T90/00
Inventor 白成超郭继峰陈宇燊
Owner HARBIN INST OF TECH