Game strategy optimization method and system and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
An optimization method and strategy technology, applied in the field of artificial intelligence, can solve problems such as credit allocation, Markov failure, inaccurate optimal response strategy and average strategy, and achieve the effect of improving accuracy and balancing exploration and utilization

Active Publication Date: 2020-06-16

HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)

View PDF4 Cites 22 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In NFSP (Neural Network Virtual Self-Play), each generated sample corresponds to a fixed opponent's strategy. If the strategy influence brought by other agents cannot be perceived, then the learned optimal response strategy and average strategy are inaccurate. , will cause the Markov property of the MDP (Markov decision process) to fail

In addition, in the multi-agent game, there are many problems such as dimension disaster, credit allocation, global exploration, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0028] Aiming at the problem that the virtual self-play algorithm (NFSP) cannot be effectively extended to multi-player complex games, the present invention discloses a game strategy optimization method, which is realized based on multi-agent reinforcement learning and virtual self-play , using centralized training and decentralized execution to improve the accuracy of the action evaluation network, and at the same time introducing a global baseline reward to more accurately measure the action income of the agent, so as to solve the credit allocation problem in the human game. At the same time, the maximum entropy method is introduced to evaluate the policy, which balances the exploration and utilization in the process of policy optimization.

[0029] Assumptions and Definitions:

[0030] Reinforcement learning is defined as learning how to map from a state to an action in order to maximize a numerical reward signal. The process of reinforcement learning can be regarded as th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a game strategy optimization method, a game strategy optimization system and a storage medium. The game strategy optimization method comprises a step of establishing a strategygradient algorithm based on maximum entropy and a step of solving a multi-agent optimal reaction strategy. The method has the beneficial effects that a centralized training and decentralized executionmode is adopted, the accuracy of an action valuation network is improved, Meanwhile, global baseline rewards are introduced to more accurately measure the action income of an intelligent agent, so that the problem of credit allocation in a human game is solved. Meanwhile, a maximum entropy method is introduced for strategy evaluation, and exploration and utilization in the strategy optimization process are balanced.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a game strategy optimization method, system and storage medium based on multi-agent reinforcement learning and virtual self-play. Background technique [0002] Many decision-making problems in real-world scenarios can be modeled as strategy solving problems in incomplete information games, but the current machine game algorithms need to abstract the state space of the problem, do not perform well in high-dimensional action spaces, and usually only It is suitable for two-player games, but most of the games in practical problems are multi-player games. [0003] Neural Fictitious Self-Play (NFSP) is a game strategy solving method that has attracted a lot of attention in the field of machine games. It learns through self-play, and uses deep reinforcement learning and supervised learning to realize machine games. Computation of the best response strategy and update of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06N5/04G06N3/08G06N20/00

CPCG06N3/08G06N5/042G06N20/00

Inventor王轩漆舒汉张加佳胡书豪黄旭忠刘洋蒋琳廖清夏文李化乐

OwnerHARBIN INSTITUTE OF TECHNOLOGY SHENZHEN (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)

Game strategy optimization method and system and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology