Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Deep reinforcement learning-oriented strategy anomaly detection method and device

A technology of reinforcement learning and anomaly detection, applied in the security defense field of deep reinforcement learning, can solve the problem of poor detection effect of anomaly strategy detection method, and achieve the effect of strong real-time performance, high feasibility and avoiding serious losses.

Pending Publication Date: 2021-08-24
ZHEJIANG UNIV OF TECH
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The detection effect of the above two abnormal strategy detection methods is not good, and there is an urgent need for a better abnormal strategy detection method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep reinforcement learning-oriented strategy anomaly detection method and device
  • Deep reinforcement learning-oriented strategy anomaly detection method and device
  • Deep reinforcement learning-oriented strategy anomaly detection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, and do not limit the protection scope of the present invention.

[0024] For reinforcement learning security decision-making fields such as autonomous driving decision-making scenarios, there may be undetected decision-making loopholes in itself, and it is also vulnerable to adversarial attacks, resulting in security risks. Especially in the process of automatic driving, the smart car is vulnerable to adversarial attacks during the action execution stage, which may make the smart body move in a wrong or even dangerous direction. In view of this, the embodiment provides a policy anomaly detection method and device oriented to deep reinf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a deep reinforcement learning-oriented strategy anomaly detection method and device. The method comprises the steps of performing reinforcement learning on a DDPG network by using a collected state sample; constructing an imitation learning network comprising an actor network and a discriminator, and training the imitation learning network by using the collected state samples and expert state actions; generating a state action pair based on an input state sample by using the parameter-optimized DDPG network, discriminating the state action pair by using the parameter-optimized discriminator, and when the discrimination result is 1, considering that the action is not attacked; when the judgment result is 0, considering that the shape action is abnormal; and when the action is abnormal and the action amplitude difference is out of the threshold range, indicating that the action is attacked, and using a state action pair generated by the simulation learning network for replacing a state action pair generated by the DDPG network so as to guide the DDPG network to make a correct decision in the reinforcement learning process of the subsequent stage.

Description

technical field [0001] The invention belongs to the field of security defense of deep reinforcement learning, and in particular relates to a policy anomaly detection method and device for deep reinforcement learning. Background technique [0002] With the continuous development of artificial intelligence technology, the Deep Reinforcement Learning (DRL) method has been valued and favored by experts and scholars since it was proposed. It has been deeply developed and widely used in the fields of automatic driving, robot control, game gaming, medical health and so on. As an indispensable and important technology in the field of artificial intelligence, the method of DRL is also constantly being expanded. RL is a key part of DRL, the core of which is that the agent obtains a relatively better strategy by continuously maximizing reward rewards. However, related studies have shown that the agent is vulnerable to adversarial attacks during the policy execution phase. The attacke...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06F21/55
CPCG06N3/08G06F21/55Y02T10/40
Inventor 陈晋音胡书隆章燕王雪柯
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products