Depth reinforcement learning strategy optimization defense method and device based on imitation learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A reinforcement learning and in-depth technology, applied in neural learning methods, biological neural network models, data processing applications, etc., can solve problems such as automatic decision-making of attacks, inaccurate decision-making results, and leaking loopholes, and achieve the effect of improving robustness.

Pending Publication Date: 2021-06-01

ZHEJIANG UNIV OF TECH

View PDF0 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In order to solve the problem that existing reinforcement learning is attacked in the field of safety decision-making (such as automatic driving scenarios) or automatic decision-making leaks, which in turn leads to inaccurate decision-making results and potential safety hazards, the purpose of the present invention is to provide a deep learning based on imitation learning. Reinforcement learning strategy optimization defense method and device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, and do not limit the protection scope of the present invention.

[0020] Based on the defense mechanism of robust learning, the embodiment provides a deep reinforcement learning policy optimization defense method based on imitation learning, which is mainly applied in automatic driving scenarios. The technical concept is: in the deep reinforcement learning training process of simulating the automatic driving of the car, the attack method based on strategy poisoning will make the learner learn a wrong strategy, so as to choose a bad action, so that the learner learns wrongly. Based on this situation, this method uses the imitation learn...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deep reinforcement learning strategy optimization defense method and device based on imitation learning, and the method comprises the steps: building an agent automatic driving simulation environment of deep reinforcement learning, constructing a target agent based on a deep Q network in reinforcement learning, and carrying out the reinforcement learning of the target agent to optimize the parameters of the deep Q network; utilizing the parameter-optimized deep Q network to generate a state action pair sequence of the target agent at T moments as expert data, wherein an action value in a state action pair corresponds to an action with a minimum Q value; constructing an adversarial agent based on the generative adversarial network, and performing imitation learning on the adversarial agent, that is, taking the state in the expert data as the input of the generative adversarial network, and taking the expert data as a label to supervise and optimize the parameters of the generative adversarial network; and performing adversarial training on the target agent based on the state generated by the adversarial agent, and then optimizing parameters of the deep Q network to achieve deep reinforcement learning strategy optimization defense.

Description

technical field [0001] The invention belongs to the field of defense oriented to deep reinforcement learning, and in particular relates to a defense method and device based on imitation learning-based deep reinforcement learning strategy optimization. Background technique [0002] Deep reinforcement learning is one of the directions of artificial intelligence that has attracted much attention in recent years. With the rapid development and application of reinforcement learning, reinforcement learning has been widely used in robot control, game gaming, computer vision, unmanned driving and other fields. In order to ensure the safe application of deep reinforcement learning in safety-critical fields, the key is to analyze and discover loopholes in deep reinforcement learning algorithms and models to prevent people with ulterior motives from using these loopholes to conduct illegal profit-making activities. Different from the single-step prediction task of traditional machine l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06N3/08G06Q10/04

CPCG06N3/084G06Q10/04G06N3/045

Inventor陈晋音章燕王雪柯胡书隆

OwnerZHEJIANG UNIV OF TECH

Depth reinforcement learning strategy optimization defense method and device based on imitation learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology