Unlock instant, AI-driven research and patent intelligence for your innovation.

Strategy model training method and device, electronic equipment and storage medium

A model and strategy technology, applied in the computer field, can solve the problems of high cost of high-fidelity simulators, poor control of industrial systems, and complex industrial systems.

Pending Publication Date: 2021-12-07
JINGDONG CITY BEIJING DIGITS TECH CO LTD
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the complexity of industrial systems, the cost of obtaining high-fidelity simulators corresponding to real industrial systems is extremely high, and there is no virtual environment close to the real scene for interaction, resulting in the trained control strategy model, which is aimed at the control effect of industrial systems. bad

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Strategy model training method and device, electronic equipment and storage medium
  • Strategy model training method and device, electronic equipment and storage medium
  • Strategy model training method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are intended to explain the present application, and should not be construed as limiting the present application.

[0031] The policy model training method, device and electronic equipment according to the embodiments of the present application are described below with reference to the accompanying drawings.

[0032] figure 1 It is a schematic flowchart of a policy model training method provided in the embodiment of the present application. Wherein, it should be noted that the execution subject of the policy model training method provided in this embodiment is a policy model training device, and the policy model training device can be im...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a strategy model training method and device, electronic equipment and a storage medium, and the method comprises the steps: obtaining a historical operation data set of industrial control scene information for pre-training, and obtaining a system dynamic characteristic model, a pre-training income model, a pre-training loss model and a pre-training strategy model, mixing a simulation operation data set obtained by the pre-training strategy model and the system dynamic characteristic model with a historical operation data set to obtain a mixed data set, and performing joint training based on the pre-training strategy model, the pre-training income model and the pre-training loss model to obtain a training result; enabling loss data output by the trained loss model to be smaller than a preset loss threshold value and enabling income data output by the income model to reach the maximum value, and therefore the simulation operation data set and the historical operation data set are mixed, mixed training is conducted on the obtained mixed data set, offline reinforcement learning is achieved, and the control strategy optimization effect in the industrial control scene is better.

Description

technical field [0001] The present application relates to the field of computer technology, and in particular to a training method, device, electronic equipment and storage medium of a strategy model. Background technique [0002] At present, there are a large number of production control links in the industrial system. In related technologies, the operation control of each production control link in the industrial system is usually based on the control strategy model learned by using the online reinforcement learning framework. However, in training the control strategy model, the online reinforcement learning framework requires a large amount of interaction and trial and error with a high-fidelity simulator or real system environment, and collects system state and action data for model optimization training. However, due to the complexity of industrial systems, the cost of obtaining high-fidelity simulators corresponding to real industrial systems is extremely high, and the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G05B13/04G06N3/04G06N3/08
CPCG05B13/042G06N3/08G06N3/045
Inventor 殷宏磊詹仙园张玥霍雨森朱翔宇郑宇
Owner JINGDONG CITY BEIJING DIGITS TECH CO LTD