Deep reinforcement learning method for a big pineapple poker two-wheel swing method

A technology of enhanced learning and poker, applied in knowledge-based computer systems, biological neural network models, instruments, etc., to achieve the effect of solving the scarcity of expert data

Inactive Publication Date: 2019-06-11
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although Libratus defeated four top human poker players with a very high probability in No Limit Texas Hold'em in January 2017, the complex system design is still far from the generality of artificial intelligence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep reinforcement learning method for a big pineapple poker two-wheel swing method
  • Deep reinforcement learning method for a big pineapple poker two-wheel swing method
  • Deep reinforcement learning method for a big pineapple poker two-wheel swing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0049] The application object of the present embodiment is big pineapple poker, and this game is a kind of playing card game, has 52 boards altogether (goes out big and small king), altogether five rounds of dealing out cards, first round each player sends out five boards respectively, from The dealer's next player starts to show their cards one by one, and their positions cannot be changed after the cards are shown. From the second round to the fifth round, each player is dealt three cards. Players need to choose a hand card and put it in the discard area. The discard area is invisible to the opponent, and the rest of the hand cards are placed in the card lane. This embodiment is aimed at the situation of two or three rounds of decision-making, but it can be extended for the first round and four or five rounds of decision-making, such as figure 1 As shown, in order to speed up the network convergence, the first round uses the neural network obtained by supervised learning to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep reinforcement learning method for a big pineapple poker two-wheel pendulum method, and the method does not need expert data, enables an agent to learn a decision rule ina self-game in combination with a neural network and a Monte Carlo tree search algorithm, and continuously improves the income. According to the method, two rounds of board surfaces are taken as rootnodes, leaf nodes are selected according to a prior probability and an average income within a given iteration frequency, if the nodes are not expanded, board surface information is encoded and inputinto a neural network to obtain the prior probability and an estimation value, the nodes are expanded, and the income of all father nodes is updated by using the selected leaf node estimation value.After iteration is finished, the sampling times of all pendulum methods of the root node are normalized, and a pendulum method with the maximum probability is selected to enter a next round of decision. And after all decisions are completed, training data are collected to update network parameters. According to the method, after a large amount of self-gaming learning is carried out, the unlearnednetwork is greatly scored, and a universal and feasible method is provided for researching the incomplete information game.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence and machine games, and in particular to a deep reinforcement learning method for pineapple poker two-three round pendulum method and a connection method with pineapple poker. Background technique [0002] The long-term goal of artificial intelligence (AI) is to allow computers to learn autonomously and surpass human performance in challenging domains. Obtaining superhuman performance in games has been hailed as the cornerstone of solving more challenging real-world problems. Therefore, many games are introduced into the research of artificial intelligence, such as chess, go, poker and so on. In the field of complete information games, AI research has achieved great success. For example, in 1997, Deep Blue Computer defeated the world chess champion Kasparov by 3.5-2.5, and in 2017, AlphaGo defeated Ke Jie, the world's number one chess player. However, in the field of incomplete in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N5/00
Inventor 袁文广韦佳张加佳
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products