Model privacy protection method and system for deep reinforcement learning

A reinforcement learning and privacy protection technology, applied in the field of model privacy protection based on imitation learning, which can solve problems such as attacks, data leakage security, threats, etc.

Active Publication Date: 2021-09-21
ZHEJIANG COLLEGE OF ZHEJIANG UNIV OF TECHOLOGY
View PDF18 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the deep reinforcement learning strategy can also be stolen through imitation learning or behavior cloning, and it is also vulnerable to attacks against sample perturbation, and there are problems of data leakage and anti-security threats

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model privacy protection method and system for deep reinforcement learning
  • Model privacy protection method and system for deep reinforcement learning
  • Model privacy protection method and system for deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0034] refer to Figure 1 ~ Figure 3 , a model privacy protection method for deep reinforcement learning, the steps are as follows:

[0035] 1) Pre-train the target agent to get the target strategy π t : The deep deterministic decision gradient algorithm (DDPG) trains the car Car, and the goal of Car is to reach the destination as quickly as possible and safely. The core of the DDPG algorithm is based on the extension of the Actor-Critic method, DQN algorithm and deterministic policy gradient (DPG). The deterministic policy μ is used to select the action a t =μ(s|θ μ ), θ μ is the policy network μ(s|θ μ ) parameters, with μ(s) acting as Actor, θ Q is the value Q network Q(s,a,θ Q ) parameters, use the Q(s,a) function to act as Critic. To improve training stability, a target network is introduced for both the policy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a model privacy protection method for deep reinforcement learning. The method comprises the following steps: pre-training a target agent to obtain a target strategy Pit; according to a strategy Pit of a deep intensity learning pre-training model, generating a trolley driving sequence state action pair at T moments as expert data for imitation learning so as to generate an imitation strategy Pi IL; generating a simulation strategy pi IL based on model learning; performing privacy protection on the model of the target agent; and performing adversarial training on the target agent. The invention further comprises a model privacy protection system for deep reinforcement learning. According to the method, a stealer can be prevented from attacking through the stealing model, and the performance of the simulation strategy is ensured to be low on the basis of ensuring the good performance of the target strategy so as to achieve the purpose of model privacy protection.

Description

technical field [0001] The invention belongs to the field of model privacy protection oriented to deep reinforcement learning, and in particular relates to a model privacy protection method and system based on imitation learning. Background technique [0002] Deep reinforcement learning is one of the directions of artificial intelligence that has attracted much attention in recent years. With the rapid development and application of reinforcement learning, reinforcement learning has been widely used in robot control, game gaming, computer vision, unmanned driving and other fields. In order to ensure the safe application of deep reinforcement learning in safety-critical fields, the key is to analyze and discover loopholes in deep reinforcement learning algorithms and models to prevent people with ulterior motives from using these loopholes to conduct illegal profit-making activities. Different from the single-step prediction task of traditional machine learning, the deep rein...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/62G06N3/00G06N3/04G06N3/08
CPCG06F21/6245G06N3/004G06N3/08G06N3/045
Inventor 何文秀
Owner ZHEJIANG COLLEGE OF ZHEJIANG UNIV OF TECHOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products