End-to-end game robot generation method and system based on multi-class imitation learning

A multi-category, robotic technology, applied in manipulators, program-controlled manipulators, manufacturing tools, etc., can solve problems such as the difficulty of knowing the reward function R, the unscientific classification of game robots, and the inability of robot game levels to meet high-quality interactive games.

Active Publication Date: 2018-11-02
NETEASE (HANGZHOU) NETWORK CO LTD
View PDF4 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in practice the return function R is often extremely difficult to know
[0006] However, the above-mentioned behavior cloning method has very poor generalization ability because it only uses state-action pairs for supervised learning; and the inverse reinforcement learning method requires a large amount of computing resources to complete the reinforcement of the current reward function in each iterative training. Learning sub-loop process leads to slow training
Moreover, traditional game robots are not intelligent enough, and the level division of game robots is not scientific enough, and the game level of robots under the corresponding level division cannot meet the needs of high-quality interactive games with game players, and game players' requirements for game experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end game robot generation method and system based on multi-class imitation learning
  • End-to-end game robot generation method and system based on multi-class imitation learning
  • End-to-end game robot generation method and system based on multi-class imitation learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] This embodiment is based on the end-to-end game robot generation method of multi-category imitation learning, including:

[0042] A player sample database is set up, the player sample database includes: player status characteristics of players of various skill levels during game play, game actions performed by players, and several predefined skill level labels;

[0043] The policy generator, the policy discriminator, and the policy classifier form an adversarial network. The policy generator, the policy discriminator, and the policy classifier are all multi-layer neural networks. The policy generator performs imitation learning in the adversarial network, and the policy generation The machine obtains the game strategy similar to the game behavior of players of different skill levels, and then generates a game robot;

[0044] The policy generator input consists of generator state features S g , any technical grade label C i The generated state label pair (S g ,C i ),...

Embodiment 2

[0067] This embodiment is based on the end-to-end game robot generation method based on multi-category imitation learning. On the basis of Embodiment 1, an effective convolutional neural network is obtained based on transfer learning training, and the effective convolutional neural network is used to obtain the player game image from each frame. 1. Extract effective features from the generated game images of each frame to obtain the player state features corresponding to the player game images in each frame and the generator state features corresponding to the generated game images in each frame.

[0068] In this embodiment, the effective convolutional neural network processes the original high-dimensional game image data, extracts more effective features from it as training data for imitation learning, and then obtains a game robot with a higher degree of imitation of the player's game behavior.

Embodiment 3

[0070] This embodiment is based on an end-to-end game robot generation method based on multi-category imitation learning. On the basis of the above-mentioned embodiment 1 or 2, the policy discriminator D ω and policy classifier C ψ The gradient update of the ADAM can use the momentum gradient of ADAM or the update method of the general gradient. And the policy generator can be G θ Stable incremental policy gradient update methods such as PPO or TRPO in reinforcement learning can be used, and techniques such as GAE can be used to weaken the influence of variance on gradient updates. This end-to-end multi-category imitation learning based on the auxiliary classification generation confrontation network mechanism, after continuous training, the policy generator G θ It can be a multi-category strategy approximator, generating game strategies similar to the player's game behavior under multiple categories.

[0071] In this embodiment, when the policy discriminator D is close to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an end-to-end game robot generation method and system based on multi-class imitation learning to obtain game robots which are more comparable to game levels of players of different technical grades. The end-to-end game robot generation method based on multi-class imitation learning comprises the following steps: establishing a player sample database; and forming an adversarial network by a policy generator, a policy discriminator and a policy sorter, wherein the policy generator carries out imitation learning in the adversarial network, the policy generator obtains game policies similar to game behaviors of the players of different technical grades to further generate game robots, and the policy generator, the policy discriminator and the policy sorter are multilayered neural networks. According to the method and system provided by the invention, the multi-class game robots can be obtained, and the robot of each class can simulate the game policy close to the player of corresponding class.

Description

technical field [0001] The invention belongs to the technical field of automatic game robots, and in particular relates to an end-to-end game robot generation method and system based on multi-category imitation learning. Background technique [0002] Traditional game robots are not intelligent enough, and the level division of game robots is not scientific enough, and the game level of robots under the corresponding level division cannot meet the needs of high-quality interactive games with game players, and game players' requirements for game experience. [0003] In order to obtain multi-category game robots, traditional imitation learning and inverse reinforcement learning methods are generally used in the prior art, but when the above methods are applied to the simulation training process of multi-category game robots, each has its own defects. [0004] Imitation learning is usually divided into two categories: one is behavior cloning method, which uses player trajectory ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): B25J9/16
CPCB25J9/163
Inventor 章宗长林嘉豪陈赢峰范长杰
Owner NETEASE (HANGZHOU) NETWORK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products