Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

38 results about "Partially observable Markov decision process" patented technology

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a probability distribution over the set of possible states, based on a set of observations and observation probabilities, and the underlying MDP.

Access network service function chain deployment method based on random learning

The invention relates to an access network service function chain deployment method based on random learning, and belongs to the technical field of wireless communication. The method comprises the steps that an access network service function chain deployment scheme based on partially observable Markov decision process partial perception topology is established for the problem of high delay causedby the physical network topology change under the 5G cloud access network scene. According to the scheme, the underlying physical network topology change is perceived through the heartbeat packet observation mechanism under the 5G access network uplink condition and the complete true topology condition cannot be acquired because of the observation error so that deployment of the service functionchain deployment of the access network slice is adaptively and dynamically adjusted by using partial perception and random learning based on the partially observable Markov decision process and the delay of the slice on the access network side can be optimized. Dynamic deployment is realized by deciding the optimal service function chain deployment mode by partially perceiving the network topologychange based on the partially observable Markov decision process so that the delay can be optimized and the resource utilization rate can be enhanced.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Frequency spectrum detection method based on partially observable Markov decision process model

The invention relates to a frequency spectrum detection method based on a partially observable Markov decision process model. The frequency spectrum detection method comprises the steps that channel state information is added to a channel state historical information sequence, and time delay is estimated so that the channel state information is obtained; the channel initial belief state and state transition probability of each channel are calculated; statistical information of the channel use state and the state transition probability are acquired through observation for a period of time, and a Markov model is established for the use state of each channel; when a time slot increases, the state historical information sequence and the current time slot value are updated; instantaneous remuneration is calculated via combination of the response information update belief state according to the state transition probability of the channels; the value function of each channel after performing different behaviors is calculated; and the maximum discount return acquired by secondary users is calculated and a strategy that the discount total remuneration is the maximum bandwidth is obtained, the channels are ordered in a decreasing way according to the total remuneration of each channel, and a user is guided to try to be accessed to the channels according to the new channel order if data transmission is required.
Owner:INST OF ACOUSTICS CHINESE ACAD OF SCI

Process automatic decision-making and reasoning method and device, computer equipment and storage medium

PendingCN114647741ADynamicImprove decision-making response speedData processing applicationsNeural architecturesDecision modelDecision taking
The invention belongs to the field of deep learning, and relates to a process automatic decision-making and reasoning method and device, computer equipment and a storage medium, and the method comprises the steps: constructing a part production process knowledge base model; constructing a three-level information model of part information, process knowledge and equipment information, integrating production data and constructing a process time sequence knowledge graph; extracting process knowledge features from the process time sequence knowledge graph; splitting the production task by combining an automatic decision-making model based on the process knowledge features, extracting the spatial features of the subtasks, and retrieving the process knowledge meeting the spatial features and the time sequence requirements in the part production process knowledge base for process decision-making; a partially observable Markov decision process algorithm is adopted to define a process time sequence knowledge graph as an environment, and process reasoning is carried out. Reasoning is carried out on an unknown production process, and after a reasoning path is obtained, manual verification is carried out, and then the reasoning path is updated to the process time sequence knowledge graph, so that process knowledge is more perfect.
Owner:GUANGDONG POLYTECHNIC NORMAL UNIV

Heterogeneous Internet of Vehicles user association method based on multi-agent deep reinforcement learning

The invention discloses a heterogeneous Internet of Vehicles user association method based on multi-agent deep reinforcement learning, and the method comprises the steps: firstly modeling a problem into a partially observable Markov decision process, and then employing the idea of decomposing a team value function, and specifically comprises the steps: building a centralized training distributed execution framework, the team value function is connected with each user value function through summation so as to achieve the purpose of implicit training of the user value functions; and then, referring to experience playback and a target network mechanism, performing action exploration and selection by using an epsilon-greedy strategy, storing historical information by using a recurrent neural network, selecting a Huber loss function to calculate loss and perform gradient descent at the same time, and finally learning the association strategy of the heterogeneous Internet of Vehicles users. Compared with a multi-agent independent deep Q learning algorithm and other traditional algorithms, the method provided by the invention can more effectively improve the energy efficiency and reduce the switching overhead at the same time in a heterogeneous Internet of Vehicles environment.
Owner:NANJING UNIV OF SCI & TECH

Multi-robot collaborative navigation and obstacle avoidance method

ActiveCN113821041AGood local minimaGeneralization strategyPosition/course control in two dimensionsVehiclesAlgorithmEngineering
The invention discloses a multi-robot collaborative navigation and obstacle avoidance method. The method comprises the following steps of modeling a decision process of a robot in an unknown environment according to a partially observable Markov decision process; according to the environment modeling information of the current robot, introducing a depth deterministic strategy gradient algorithm, extracting a sampled image sample, and inputting the sampled image sample into a convolutional neural network for feature extraction; improvement being carried out on the basis of a depth deterministic strategy gradient algorithm, a long-short-term memory neural network being introduced to enable the network to have memorability, and image data being more accurate and stable by using a frame skipping mechanism; and meanwhile, an experience pool playback mechanism being modified, and a priority being set for each stored experience sample so that few and important experiences can be more applied to learning, and learning efficiency is improved; and finally, a multi-robot navigation obstacle avoidance simulation system being established. The method is advantaged in that the robot is enabled to learn navigation and obstacle avoidance from easy to difficult by adopting a curriculum type learning mode so that the training speed is accelerated.
Owner:SUN YAT SEN UNIV

Computing unloading and resource management method in edge calculation based on deep reinforcement learning

PendingCN113821346AMaximize self-interestAddress different interest pursuitsResource allocationProgram loading/initiatingEdge nodeEngineering
The invention discloses a computing unloading and resource management method in edge computing based on deep reinforcement learning, which comprises the following steps: constructing an edge computing communication model based on a partially observable Markov decision process, the edge computing communication model comprising M + N agents, the M agents being edge nodes, and the N agents being users; setting a target optimization function according to a user cost minimization target and an edge node utility maximization target; setting a time slot length, a time frame length, an initialization time slot and a time frame; enabling the edge node and the user to respectively use the partial observable Markov decision process to obtain a resource allocation strategy and a task unloading strategy; according to the task unloading strategy and the resource allocation strategy, optimizing a target optimization function by utilizing a participant-criminator model; and dividing and processing the computing task according to the optimized target optimization function. According to the invention, different interest pursues between the edge device and the user are solved, and respective interests are ensured to the maximum extent.
Owner:TIANJIN UNIV

Individual movement intervention infectious disease prevention and control method and system

The invention provides an individual movement intervention infectious disease prevention and control method and system. The method comprises the following steps: acquiring daily historical state information and individual relationship information of individual users in a target city within a preset time interval; and inputting the historical state information and the individual relationship information into a trained individual movement intervention infectious disease prevention and control model to obtain prevention and control intervention measures of each user individual in the target city, wherein the trained individual movement intervention infectious disease prevention and control model is obtained by training a graph neural network, a long and short term neural network and an intelligent agent according to sample user individual state information and sample individual relationship information, and the intelligent agent is constructed based on a partial observable Markov decision process; and the individual state information of the sample user comprises health state information of a hidden infected person converted into a dominant infected person. According to the invention, the number of infected people can be reduced as much as possible under low travel intervention.
Owner:TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products