Inverse reinforcement learning processing method and device, storage medium and electronic device

A technology of reinforcement learning and processing methods, applied in the field of communication, can solve the problem of high cost and achieve the effect of reducing the cost of demonstration

Pending Publication Date: 2022-01-11
ZTE CORP +1
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present application provides an inverse reinforcement learning processing method, device, storage medium, and electronic device to at least solve the problem in the related art that inverse reinforcement learning needs to collect a large number of expert demonstrations, and a large number of expert demonstrations lead to high costs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Inverse reinforcement learning processing method and device, storage medium and electronic device
  • Inverse reinforcement learning processing method and device, storage medium and electronic device
  • Inverse reinforcement learning processing method and device, storage medium and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0115] Embodiments of the present application will be described in detail below with reference to the drawings and in combination with the embodiments.

[0116] It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

[0117] The method embodiments provided in the embodiments of the present application may be executed in mobile terminals, computer terminals or similar computing devices. Taking running on a mobile terminal as an example, figure 1 It is a block diagram of the hardware structure of the mobile terminal of the inverse reinforcement learning processing method of the embodiment of the present application, such as figure 1 As shown, the mobile terminal may include one or more ( figure 1 Only one is shown in the figure) processor 102 (processor 102 may include but not limited to a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides an inverse reinforcement learning processing method and device, a storage medium and an electronic device, and the method comprises the steps of obtaining a target demonstration track which is provided by taking a target state in a preset state candidate set as an initial state; adding the target demonstration track into the initialized demonstration set to obtain an updated demonstration set; and performing inverse reinforcement learning training according to the updated demonstration set to obtain a target strategy meeting a preset condition. According to the invention, the problem of high cost caused by a large number of expert demonstration due to the need of collecting a large number of expert demonstration in inverse reinforcement learning in related technologies can be solved; comprising selecting a most valuable state (a target state with a maximum contribution value) from a state candidate set as an initial state; obtaining a demonstration track taking the target state as an initial state; and updating the demonstration set, and training a strategy according to the updated demonstration set, thereby achieving the purpose of reducing the demonstration cost.

Description

technical field [0001] The embodiments of the present application relate to the communication field, and in particular, relate to an inverse reinforcement learning processing method, device, storage medium, and electronic device. Background technique [0002] Imitation learning is a way of training strategies different from reinforcement learning. By training strategies from experts' demonstrations, it can effectively reduce the duration of training. For problems with sparse reward functions, effective strategies can also be trained. Inverse reinforcement learning is a typical imitation learning method. Its idea is to learn the reward function from the demonstration of experts first, and then apply the reinforcement learning training strategy. However, the inverse reinforcement learning training strategy needs to collect a large number of expert demonstrations. In actual tasks, the cost of collecting expert demonstrations is high, and this cost is time, money, security and o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 屠要峰黄文宇黄圣君周祥生孙康康
Owner ZTE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products