Game strategy optimization method and system and storage medium

An optimization method and strategy technology, applied in the field of artificial intelligence, can solve problems such as credit allocation, Markov failure, inaccurate optimal response strategy and average strategy, and achieve the effect of improving accuracy and balancing exploration and utilization

Active Publication Date: 2020-06-16
HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)
View PDF4 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In NFSP (Neural Network Virtual Self-Play), each generated sample corresponds to a fixed opponent's strategy. If the strategy influence brought by other agents cannot be perceived, then the learned optimal response strategy and average strategy are inaccurate. , will cause the Markov property of the MDP (Markov decision process) to fail
In addition, in the multi-agent game, there are many problems such as dimension disaster, credit allocation, global exploration, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Game strategy optimization method and system and storage medium
  • Game strategy optimization method and system and storage medium
  • Game strategy optimization method and system and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Aiming at the problem that the virtual self-play algorithm (NFSP) cannot be effectively extended to multi-player complex games, the present invention discloses a game strategy optimization method, which is realized based on multi-agent reinforcement learning and virtual self-play , using centralized training and decentralized execution to improve the accuracy of the action evaluation network, and at the same time introducing a global baseline reward to more accurately measure the action income of the agent, so as to solve the credit allocation problem in the human game. At the same time, the maximum entropy method is introduced to evaluate the policy, which balances the exploration and utilization in the process of policy optimization.

[0029] Assumptions and Definitions:

[0030] Reinforcement learning is defined as learning how to map from a state to an action in order to maximize a numerical reward signal. The process of reinforcement learning can be regarded as th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a game strategy optimization method, a game strategy optimization system and a storage medium. The game strategy optimization method comprises a step of establishing a strategygradient algorithm based on maximum entropy and a step of solving a multi-agent optimal reaction strategy. The method has the beneficial effects that a centralized training and decentralized executionmode is adopted, the accuracy of an action valuation network is improved, Meanwhile, global baseline rewards are introduced to more accurately measure the action income of an intelligent agent, so that the problem of credit allocation in a human game is solved. Meanwhile, a maximum entropy method is introduced for strategy evaluation, and exploration and utilization in the strategy optimization process are balanced.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a game strategy optimization method, system and storage medium based on multi-agent reinforcement learning and virtual self-play. Background technique [0002] Many decision-making problems in real-world scenarios can be modeled as strategy solving problems in incomplete information games, but the current machine game algorithms need to abstract the state space of the problem, do not perform well in high-dimensional action spaces, and usually only It is suitable for two-player games, but most of the games in practical problems are multi-player games. [0003] Neural Fictitious Self-Play (NFSP) is a game strategy solving method that has attracted a lot of attention in the field of machine games. It learns through self-play, and uses deep reinforcement learning and supervised learning to realize machine games. Computation of the best response strategy and update of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N5/04G06N3/08G06N20/00
CPCG06N3/08G06N5/042G06N20/00
Inventor 王轩漆舒汉张加佳胡书豪黄旭忠刘洋蒋琳廖清夏文李化乐
Owner HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products