Crowd evacuation simulation method in dynamic environment based on deep reinforcement learning

A technology of reinforcement learning and dynamic environment, which is applied in the field of crowd simulation and computer simulation, can solve the problems of exponential increase in computational complexity, huge storage space and indexing time, and low accuracy of crowd, so as to increase behavior randomness and crowd behavior The effect of randomness enhancement

Active Publication Date: 2021-02-09
AEROSPACE INFORMATION RES INST CAS +1
View PDF8 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

On the one hand, the state of the environment and the movement of the agent are continuous processes, that is, continuous state space and action space. However, algorithms such as Q-Learning and SARSA only support discrete state space and action space, and the huge continuous space may lead to Curse of Dimensionality Problem
On the other hand, this type of algorithm usually uses Q table as a state-action mapping storage body. In the case of continuous state space and action space, huge storage space and indexing time are required, resulting in an exponential increase in computational complexity.
[0007] Due to the low accuracy and weak

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crowd evacuation simulation method in dynamic environment based on deep reinforcement learning
  • Crowd evacuation simulation method in dynamic environment based on deep reinforcement learning
  • Crowd evacuation simulation method in dynamic environment based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] 1) Deep reinforcement learning algorithm

[0031] Such as figure 1 As shown, the present invention provides a crowd evacuation simulation method based on deep reinforcement learning in a dynamic environment, which includes:

[0032] The crowd is a multi-agent system. For a single pedestrian agent, a deep neural network is used to approximate the mapping function from state to action as the behavior controller of the agent. By observing the dynamic environment state, the pedestrian agent uses the The mapping function makes behavioral decisions and takes corresponding actions from the action space. The goal of a reinforcement learning agent is the process of finding an optimal policy. The so-called strategy refers to the mapping from state to action, which is often represented by the symbol π. A policy refers to the probability distribution of an agent's actions in a given state: Where S is a finite state set; A is a finite action set; is the state transition proba...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a crowd evacuation simulation method in a dynamic environment based on deep reinforcement learning, and the method comprises the steps: using a crowd as a multi-agent system, employing a deep neural network to approach a mapping function from a state to an action for a single pedestrian agent to serve as a behavior controller of the agent; and by the pedestrian agent, makinga behavior decision by observing the dynamic environment state and utilizing the mapping function, and taking corresponding actions from the action space. In the evacuation simulation process of a discrete time sequence, one strategy refers to probability distribution of agent actions on the time sequence in a given state. The information feedback of the environment to the intelligent agent is reflected as a return value. The target of the intelligent agent is to maximize the expectation of the cumulative return value, that is, an optimal action value function is searched, and an optimal strategy is obtained. According to the method, crowd evacuation simulation which is difficult to realize in classical crowd simulation in a high-dynamic environment can be realized, the simulation effectis closer to the real situation, and the randomness of crowd behaviors is enhanced.

Description

technical field [0001] The invention belongs to the technical field of crowd simulation and computer simulation, and in particular relates to a crowd evacuation simulation method in a dynamic environment based on deep reinforcement learning. Background technique [0002] From the perspective of crowd evacuation simulation, the commonly used crowd simulation models can be divided into macroscopic models and microscopic models. The macro model mainly models the group behavior, regards the group as a continuum following the laws of fluid mechanics, and is suitable for large-scale crowd simulation in a large area. The micro model uses a single individual as the basic unit of modeling, and through the movement of a large number of individuals and the interaction between individuals, group behavior emerges. In contrast, microscopic models are more suitable for modeling and simulating individual behaviors, and have received more research. [0003] Each model has its own applicabl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06Q10/04G06Q50/26G06N3/08
CPCG06Q10/04G06Q50/265G06N3/08
Inventor 龚建华申申孙麇李毅殷兵晓武栋
Owner AEROSPACE INFORMATION RES INST CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products