Inverse reinforcement learning processing method and device, storage medium and electronic device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and processing methods, applied in the field of communication, can solve the problem of high cost and achieve the effect of reducing the cost of demonstration

Pending Publication Date: 2022-01-11

ZTE CORP +1

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The embodiment of the present application provides an inverse reinforcement learning processing method, device, storage medium, and electronic device to at least solve the problem in the related art that inverse reinforcement learning needs to collect a large number of expert demonstrations, and a large number of expert demonstrations lead to high costs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0115] Embodiments of the present application will be described in detail below with reference to the drawings and in combination with the embodiments.

[0116] It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

[0117] The method embodiments provided in the embodiments of the present application may be executed in mobile terminals, computer terminals or similar computing devices. Taking running on a mobile terminal as an example, figure 1 It is a block diagram of the hardware structure of the mobile terminal of the inverse reinforcement learning processing method of the embodiment of the present application, such as figure 1 As shown, the mobile terminal may include one or more ( figure 1 Only one is shown in the figure) processor 102 (processor 102 may include but not limited to a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides an inverse reinforcement learning processing method and device, a storage medium and an electronic device, and the method comprises the steps of obtaining a target demonstration track which is provided by taking a target state in a preset state candidate set as an initial state; adding the target demonstration track into the initialized demonstration set to obtain an updated demonstration set; and performing inverse reinforcement learning training according to the updated demonstration set to obtain a target strategy meeting a preset condition. According to the invention, the problem of high cost caused by a large number of expert demonstration due to the need of collecting a large number of expert demonstration in inverse reinforcement learning in related technologies can be solved; comprising selecting a most valuable state (a target state with a maximum contribution value) from a state candidate set as an initial state; obtaining a demonstration track taking the target state as an initial state; and updating the demonstration set, and training a strategy according to the updated demonstration set, thereby achieving the purpose of reducing the demonstration cost.

Description

technical field [0001] The embodiments of the present application relate to the communication field, and in particular, relate to an inverse reinforcement learning processing method, device, storage medium, and electronic device. Background technique [0002] Imitation learning is a way of training strategies different from reinforcement learning. By training strategies from experts' demonstrations, it can effectively reduce the duration of training. For problems with sparse reward functions, effective strategies can also be trained. Inverse reinforcement learning is a typical imitation learning method. Its idea is to learn the reward function from the demonstration of experts first, and then apply the reinforcement learning training strategy. However, the inverse reinforcement learning training strategy needs to collect a large number of expert demonstrations. In actual tasks, the cost of collecting expert demonstrations is high, and this cost is time, money, security and o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N20/00

CPCG06N20/00

Inventor 屠要峰黄文宇黄圣君周祥生孙康康

Owner ZTE CORP

Inverse reinforcement learning processing method and device, storage medium and electronic device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology