Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Offline reinforcement learning method and device based on state offset correction

A technology of reinforcement learning and state transfer, applied in the field of offline reinforcement learning, can solve problems such as the mismatch between the training strategy and the data set strategy state distribution, ignoring the comprehensiveness of the data set coverage state, etc., to improve the performance of the algorithm and reduce the state deviation Effect

Pending Publication Date: 2022-07-22
TSINGHUA UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] This application provides an off-line reinforcement learning method and device based on state offset correction to solve the problem of using a restrictive strategy in related technologies, ignoring the comprehensiveness of the data set coverage state, resulting in inconsistent state distribution of the training strategy and the data set strategy. matching technical issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Offline reinforcement learning method and device based on state offset correction
  • Offline reinforcement learning method and device based on state offset correction
  • Offline reinforcement learning method and device based on state offset correction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.

[0029]The offline reinforcement learning method and apparatus based on state offset correction according to the embodiments of the present application will be described below with reference to the accompanying drawings. Aiming at the technical problem that the method of restricting the strategy in the related art mentioned in the above-mentioned background technology center ignores the comprehensiveness of the coverage state of the data set, thereby causing the st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an off-line reinforcement learning method and device based on state offset correction, and the method comprises the steps: randomly sampling a sample of a current time step from a data set of off-line reinforcement learning, and training an environment model and a state transition model according to the maximum likelihood estimation; constructing a disturbed state based on preset noise, obtaining a first state, obtained by the trained environment model based on the action, of the next moment, and obtaining a second state, obtained by the trained state transition model based on the current state, of the next moment; and updating the strategy network by using the distance between the first state and the second state, learning and training an action value network through a conservative action value function until a convergence condition is met, and generating an offline reinforcement learning model based on state offset correction. Therefore, the technical problem that the state distribution of the training strategy and the data set strategy is not matched due to the fact that the comprehensiveness of the coverage state of the data set is ignored by adopting a limiting strategy method in the related technology is solved.

Description

technical field [0001] The present application relates to the technical field of offline reinforcement learning, and in particular, to an offline reinforcement learning method and device based on state offset correction. Background technique [0002] Reinforcement learning mainly studies how the agent obtains the maximum reward, that is, it learns the optimal strategy to solve the problem under a given task. Reinforcement learning has received extensive attention because it can more intuitively model sequential decision-making problems. In recent years, with the rise of deep learning and large-scale datasets, due to the strong generalization ability of deep neural networks as function approximators, reinforcement learning uses neural networks to deal with more complex scenarios. Deep reinforcement learning has made rapid progress in video games, Go, recommender systems, and robotics. [0003] Compared with general reinforcement learning, offline reinforcement learning only...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F30/27G06F119/02
CPCG06F30/27G06F2119/02
Inventor 季向阳张宏昌邵键准
Owner TSINGHUA UNIV
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More