Model privacy protection method and system for deep reinforcement learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning and privacy protection technology, applied in the field of model privacy protection based on imitation learning, which can solve problems such as attacks, data leakage security, threats, etc.

Active Publication Date: 2021-09-21

ZHEJIANG COLLEGE OF ZHEJIANG UNIV OF TECHOLOGY

View PDF18 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the deep reinforcement learning strategy can also be stolen through imitation learning or behavior cloning, and it is also vulnerable to attacks against sample perturbation, and there are problems of data leakage and anti-security threats

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0034] refer to Figure 1 ~ Figure 3 , a model privacy protection method for deep reinforcement learning, the steps are as follows:

[0035] 1) Pre-train the target agent to get the target strategy π t : The deep deterministic decision gradient algorithm (DDPG) trains the car Car, and the goal of Car is to reach the destination as quickly as possible and safely. The core of the DDPG algorithm is based on the extension of the Actor-Critic method, DQN algorithm and deterministic policy gradient (DPG). The deterministic policy μ is used to select the action a t =μ(s|θ μ ), θ μ is the policy network μ(s|θ μ ) parameters, with μ(s) acting as Actor, θ Q is the value Q network Q(s,a,θ Q ) parameters, use the Q(s,a) function to act as Critic. To improve training stability, a target network is introduced for both the policy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a model privacy protection method for deep reinforcement learning. The method comprises the following steps: pre-training a target agent to obtain a target strategy Pit; according to a strategy Pit of a deep intensity learning pre-training model, generating a trolley driving sequence state action pair at T moments as expert data for imitation learning so as to generate an imitation strategy Pi IL; generating a simulation strategy pi IL based on model learning; performing privacy protection on the model of the target agent; and performing adversarial training on the target agent. The invention further comprises a model privacy protection system for deep reinforcement learning. According to the method, a stealer can be prevented from attacking through the stealing model, and the performance of the simulation strategy is ensured to be low on the basis of ensuring the good performance of the target strategy so as to achieve the purpose of model privacy protection.

Description

technical field [0001] The invention belongs to the field of model privacy protection oriented to deep reinforcement learning, and in particular relates to a model privacy protection method and system based on imitation learning. Background technique [0002] Deep reinforcement learning is one of the directions of artificial intelligence that has attracted much attention in recent years. With the rapid development and application of reinforcement learning, reinforcement learning has been widely used in robot control, game gaming, computer vision, unmanned driving and other fields. In order to ensure the safe application of deep reinforcement learning in safety-critical fields, the key is to analyze and discover loopholes in deep reinforcement learning algorithms and models to prevent people with ulterior motives from using these loopholes to conduct illegal profit-making activities. Different from the single-step prediction task of traditional machine learning, the deep rein...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F21/62G06N3/00G06N3/04G06N3/08

CPCG06F21/6245G06N3/004G06N3/08G06N3/045

Inventor 何文秀

Owner ZHEJIANG COLLEGE OF ZHEJIANG UNIV OF TECHOLOGY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Model privacy protection method and system for deep reinforcement learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology