Unlock instant, AI-driven research and patent intelligence for your innovation.

Training and control method and device, computing equipment and medium

A training method and location information technology, applied in the field of reinforcement learning, can solve the problems of reducing the rationality of smart objects and the lack of smart object interaction of smart objects

Pending Publication Date: 2021-12-31
SENSETIME GRP LTD +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the reward information used to train each smart object is the same, each trained smart object only cares about individual behaviors in various scenarios and ignores the behavior of surrounding smart objects. The interactivity of the smart object also reduces the rationality of the behavior of the smart object

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training and control method and device, computing equipment and medium
  • Training and control method and device, computing equipment and medium
  • Training and control method and device, computing equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0088] In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a training and control method and device, computing equipment and a medium. The training method comprises the steps of obtaining the initial reward information of each intelligent object corresponding to a current round target system, wherein the initial reward information is used for representing the task completion degree of the corresponding intelligent object in the current round, for each intelligent object in the at least one intelligent object in the target system, determining target reward information of the intelligent object in the current round based on the position information and the initial reward information of each intelligent object and the local coordination coefficient of the current round, and based on the target reward information corresponding to the intelligent object in the current round, adjusting the network parameters of the operation network of the intelligent object, and based on the adjusted network parameters of the operation network of the intelligent object, determining the initial reward information of the next round, and repeatedly executing the above steps until the training of the operation network of each intelligent object reaches a preset training cut-off condition, and obtaining each trained operation network.

Description

technical field [0001] The present disclosure relates to the technical field of reinforcement learning, and in particular, to a training and control method, apparatus, computing device and medium. Background technique [0002] Self-propelled particle system is a common model for modeling multi-intelligent object systems, for example, modeling traffic flow system under traffic intersection, modeling fish school system, etc. Through the training of the smart objects in the self-propelled particle system, the simulation of the behavior of the smart objects can be effectively realized. [0003] In the prior art, most of the smart objects are trained by means of reinforcement learning. Specifically, each smart object is independently iteratively trained by setting fixed reward information. Since the reward information used to train each smart object is the same, each smart object obtained by training only cares about the behavior of the individual in various scenarios, ignoring ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G05B13/04
CPCG05B13/042
Inventor 彭正皓黎权毅刘春晓周博磊
Owner SENSETIME GRP LTD