Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Multi-Core Processor-Single Graphics Processor Deep Reinforcement Learning Acceleration Method

A graphics processor and multi-core processor technology, applied in the computer field, can solve different and unrelated problems

Active Publication Date: 2022-04-22
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Literature (Liang Xingxing, Feng Yanghe, Ma Yang, Cheng Guangquan, Huang Jincai, Wang Qi, Zhou Yuzhen, Liu Zhong. A review of multi-agent deep reinforcement learning [J / OL]. Acta Automatica Sinica: 2019.05) proposed the DRL multi-environment (Agent ) communication method and influencing parameters, this document does not involve how to realize it on hardware, and the acceleration that can be achieved by the same algorithm in different hardware environments is different

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Multi-Core Processor-Single Graphics Processor Deep Reinforcement Learning Acceleration Method
  • A Multi-Core Processor-Single Graphics Processor Deep Reinforcement Learning Acceleration Method
  • A Multi-Core Processor-Single Graphics Processor Deep Reinforcement Learning Acceleration Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In conjunction with the accompanying drawings, the technical solutions in the examples of the present invention are clearly and completely described below:

[0024] figure 1 Shown is the implementation process of the deep reinforcement learning acceleration method based on the multi-core CPU-GPU platform of the present invention, including the following steps:

[0025] 1. Allocate memory space for the CPU and GPU, among which three memory spaces are set on the CPU, one stores the experience information pool for network training; the other two memory spaces store the action network parameter θ and the evaluation network parameter ω respectively. Two memory spaces are allocated on the GPU to store local action network parameters θ - and locally evaluate the network parameter ω - . Except that the CPU and GPU can control the internal memory separately, the memory of the CPU and GPU can communicate through the PCIE bus, including two operations of reading and writing. U...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention proposes a multi-core processor-single graphics processor deep reinforcement learning acceleration method, which can establish a deep reinforcement learning framework based on the PPO algorithm on the CPU+GPU platform, and at the same time invents an accelerated collection sampling and A pipelined approach to inference. The multi-environment process simulation process is realized on a multi-core CPU, and multiple environment processes are arranged on each core. At the same time, the CPU plays the role of controlling the data. Implement the neural network model inference process on the GPU. The CPU and the GPU respectively store the memory space for the action network and the evaluation network parameters. In the process of interaction between the environment and the agent, each time information is stored in the memory of the CPU’s experience information pool. Truncate parameters and filter conditions, and extract batch-scale data from its own experience information pool memory for training. Through load balancing, when the simulation time of the CPU single-core stacking environment process is the same as the neural network model reasoning time, the present invention can realize the pipeline structure of parallel execution of environment sampling and action reasoning: through parallel half CPU simulator and half GPU reasoning running, and prepare for the next half-CPU simulator and half-GPU inference process during data transmission, which can further accelerate the speed of reinforcement learning, reaching an overall training speed nearly twice that of traditional methods.

Description

technical field [0001] The invention belongs to the field of computers, in particular to a deep reinforcement learning acceleration method based on a multi-core central processing unit (CPU)-single graphics processing unit (GPU) platform. Background technique [0002] Deep reinforcement learning (DRL, deep reinforcement learning) is the product of the combination of deep learning and reinforcement learning. It integrates the powerful understanding ability of deep learning on perception problems such as vision and the decision-making ability of reinforcement learning, and realizes end-to-end learning. . The emergence of deep reinforcement learning has made reinforcement learning technology truly practical, able to solve complex problems in real-world scenarios, and has been widely used in various engineering fields, such as industrial manufacturing, robot positioning recognition, game games, etc. [0003] Literature published in 2013 (Mnih V, Kavukcuoglu K, Silver D, et al. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50G06T1/20G06N3/04G06N3/063G06N3/08
CPCG06F9/505G06T1/20G06N3/063G06N3/084G06N3/045
Inventor 阮爱武朱重阳
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products