Deep reinforcement learning model training method and device based on hyper-parameter optimization

A technology of reinforcement learning and training methods, applied in the computer field, can solve problems such as poor adaptability of training effects, large effect deviations, and restrictions on the implementation of deep reinforcement learning methods

Pending Publication Date: 2021-11-30
JINGDONG CITY BEIJING DIGITS TECH CO LTD
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the training of Deep RL requires large-scale interaction with the environment, this condition is not available in most real-world scenarios. This problem seriously restricts the implementation of deep reinforcement learning methods.
[0003] In order to solve this problem, Off-line Deep Reinforcement Learning (Off-line Deep RL) technology is proposed in related technologies. However, the training effect of the current Off-line Deep RL method will vary greatly with different data sets, resulting in The training effect that can be achieved still has the problems of poor adaptability and low performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep reinforcement learning model training method and device based on hyper-parameter optimization
  • Deep reinforcement learning model training method and device based on hyper-parameter optimization
  • Deep reinforcement learning model training method and device based on hyper-parameter optimization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The exemplary embodiments of the present application will be described below, including various details of the present application embodiments to help understand, and they should be considered simply exemplary. Accordingly, it will be appreciated by those skilled in the art that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present application. Also, for the sake of clarity and concise, the following description is omitted in the following description.

[0033] Next, a training method, apparatus, and apparatus of the deep-enhanced deep strengthening learning model of the present application embodiment will be described below with reference to the drawings.

[0034] It should be noted that the offline depth enhancement learning algorithm frame typically contains a plurality of depth neural networks, and the network parameters optimization of the training process need to be alternately trai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep reinforcement learning model training method and device based on hyper-parameter optimization, wherein the method comprises the steps of obtaining a plurality of initial hyper-parameter combinations and a plurality of first deep reinforcement learning models; training a plurality of first deep reinforcement learning models by adopting a plurality of hyper-parameters in the initial hyper-parameter combination to obtain training evaluation indexes respectively corresponding to the plurality of first deep reinforcement learning models; screening out a second deep reinforcement learning model from the plurality of first deep reinforcement learning models according to the training evaluation index; performing optimization processing on the initial hyper-parameter combination by adopting a plurality of target hyper-parameters corresponding to the second deep reinforcement learning model to form a target hyper-parameter combination; and obtaining a target deep reinforcement learning model. Therefore, hyper-parameter optimization and model training are combined to achieve training of the deep reinforcement learning model, the deep reinforcement learning model with higher performance can be trained, and the trained model can adapt to wider application scenes.

Description

Technical field [0001] The present application relates to the field of computer technology, and more particularly to a training method, apparatus, electronic device, storage medium, and computer program product based on deep strengthening learning models optimized. Background technique [0002] Deep Reinforcement Learning, referred to as DEEP RL) is a technology that has emerged in recent years. This technology integrates two technologies for deep learning and strengthening learning. Deep RL has a high dimensional state in a complex system to perform mode identification, and on this, on this, the ability to operate output. Based on deep strengthening learning, you can learn from the environment, and learn from the environment. DEEP RL is suitable for control, decision making, and complex system optimization tasks. In the fields of game, automatic driving control and decision making, robotic control, finance, industrial system control, DEEP RL has huge potential application space....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 张玥尹泽夏霍雨森王小波郑宇
Owner JINGDONG CITY BEIJING DIGITS TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products