Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

4934 results about "Reinforcement learning" patented technology

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Deep and reinforcement learning-based real-time online path planning method of

The present invention provides a deep and reinforcement learning-based real-time online path planning method. According to the method, the high-level semantic information of an image is obtained through using a deep learning method, the path planning of the end-to-end real-time scenes of an environment can be completed through using a reinforcement learning method. In a training process, image information collected in the environment is brought into a scene analysis network as a current state, so that an analytical result can be obtained; the analytical result is inputted into a designed deep cyclic neural network; and the decision-making action of each step of an intelligent body in a specific scene can be obtained through training, so that an optimal complete path can be obtained. In an actual application process, image information collected by a camera is inputted into a trained deep and reinforcement learning network, so that the direction information of the walking of the intelligent body can be obtained. With the method of the invention, obtained image information can be utilized to the greatest extent under a premise that the robustness of the method is ensured and the method slightly depends on the environment, and real-time scene walking information path planning can be realized.
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Dialogue automatic reply system based on deep learning and reinforcement learning

ActiveCN106448670APersonality fitSpeech recognitionDialog systemActive state
The invention discloses a dialogue automatic reply system based on deep learning and reinforcement learning. The dialogue automatic reply system comprises a user interaction module which receives question information inputted by a user in a dialogue system interface; a session management module which records the active state of the user, wherein the active state includes historical dialogue information, user position transformation information and user emotion change information; a user analysis module which analyzes registration information and the active state of the user and portraits for the user so as to obtain user portrait information; a dialogue module which generates reply information through a language module according to the question information of the user with combination of the portrait of the user; and a model learning module which updates the language model through the reinforcement learning technology according to the reply information generated by the language model. According to the dialogue automatic reply system based on deep learning and reinforcement learning, the reply of the dialogue meeting the personality of the user can be given according to the dialogue text inputted by the user with combination of context information, the personality characteristics of the user and the intentions in the dialogue.
Owner:EMOTIBOT TECH LTD

Autonomous underwater vehicle trajectory tracking control method based on deep reinforcement learning

ActiveCN108803321AStabilize the learning processOptimal target strategyAdaptive controlSimulationIntelligent control
The invention provides an autonomous underwater vehicle (AUV) trajectory tracking control method based on deep reinforcement learning, belonging to the field of deep reinforcement learning and intelligent control. The autonomous underwater vehicle trajectory tracking control method based on deep reinforcement learning includes the steps: defining an AUV trajectory tracking control problem; establishing a Markov decision-making process model of the AUV trajectory tracking problem; constructing a hybrid policy-evaluation network which consists of multiple policy networks and evaluation networks;and finally, solving the target policy of AUV trajectory tracking control by the constructed hybrid policy-evaluation network, for the multiple evaluation networks, evaluating the performance of eachevaluation network by defining an expected Bellman absolute error and updating only one evaluation network with the lowest performance at each time step, and for the multiple policy networks, randomly selecting one policy network at each time step and using a deterministic policy gradient to update, so that the finally learned policy is the mean value of all the policy networks. The autonomous underwater vehicle trajectory tracking control method based on deep reinforcement learning is not easy to be influenced by the bad AUV historical tracking trajectory, and has high precision.
Owner:TSINGHUA UNIV

Traffic signal self-adaptive control method based on deep reinforcement learning

InactiveCN106910351ARealize precise perceptionSolve the problem of inaccurate perception of traffic statusControlling traffic signalsNeural architecturesTraffic signalReturn function
The invention relates to the technical field of traffic control and artificial intelligence and provides a traffic signal self-adaptive control method based on deep reinforcement learning. The method includes the following steps that 1, a traffic signal control agent, a state space S, a motion space A and a return function r are defined; 2, a deep neutral network is pre-trained; 3, the neutral network is trained through a deep reinforcement learning method; 4, traffic signal control is carried out according to the trained deep neutral network. By preprocessing traffic data acquired by magnetic induction, video, an RFID, vehicle internet and the like, low-layer expression of the traffic state containing vehicle position information is obtained; then the traffic state is perceived through a multilayer perceptron of deep learning, and high-layer abstract features of the current traffic state are obtained; on the basis, a proper timing plan is selected according to the high-layer abstract features of the current traffic state through the decision making capacity of reinforcement learning, self-adaptive control of traffic signals is achieved, the vehicle travel time is shortened accordingly, and safe, smooth, orderly and efficient operation of traffic is guaranteed.
Owner:DALIAN UNIV OF TECH

Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

The invention discloses an unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning. The method comprises the following steps: firstly, modeling a channel model, a coverage model and an energy loss model in an unmanned aerial vehicle to ground communication scene; modeling a throughput maximization problem of the unmanned aerial vehicleto ground communication network into a locally observable Markov decision process; obtaining local observation information and instantaneous rewards through continuous interaction between the unmanned aerial vehicle and the environment, conducting centralized training based on the information to obtain a distributed strategy network; deploying a strategy network in each unmanned aerial vehicle, so that each unmanned aerial vehicle can obtain a moving direction and a moving distance decision based on local observation information of the unmanned aerial vehicle, adjusts the hovering position, and carries out distributed cooperation. In addition, proportional fair scheduling and unmanned aerial vehicle energy consumption loss information are introduced into an instantaneous reward function,the fairness of the unmanned aerial vehicles for ground user services is guaranteed while the throughput is improved, energy consumption loss is reduced, and the unmanned aerial vehicle cluster can adapt to the dynamic environment.
Owner:DALIAN UNIV OF TECH

Unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and unmanned aerial vehicle

ActiveCN110488861AAchieve flight control optimizationMeet the needs of the actual flight environmentPosition/course control in three dimensionsFlight directionTrajectory optimization
The invention discloses an unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and an unmanned aerial vehicle. The method comprises the steps: constructing a reinforcement learning network in advance, and generating state data and action decision data in real time in a flight process of an unmanned aerial vehicle; and taking the state data as input, the action decision data as output and the instantaneous energy efficiency as reward return, optimizing strategy parameters by utilizing a PPO algorithm, and outputting an optimal strategy. The device comprises a construction module, a training data collection module and a training module. The unmanned aerial vehicle comprises a processor, and the processor is used for executing the unmanned aerial vehicle trajectory optimization method based on deep reinforcement learning. The method has the capability of carrying out autonomous learning from accumulated flight data, can intelligently determine the optimal flight speed, acceleration, flight direction and return time in an unknown communication scene, concludes a flight strategy with the optimal energy efficiency, and is higher in environment adaptability and generalization capability.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Human skeleton behavior recognition method and device based on deep reinforcement learning

The invention discloses a human skeleton behavior recognition method and device based on deep reinforcement learning. The method comprises: uniform sampling is carried out on each video segment in a training set to obtain a video with a fixed frame number, thereby training a graphic convolutional neural network; after parameter fixation of the graphic convolutional neural network, an extraction frame network is trained by using the graphic convolutional neural network to obtain a representative frame meeting a preset condition; the graphic convolutional neural network is updated by using the representative frame meeting the preset condition; a target video is obtained and uniform sampling is carried out on the target video, so that a frame obtained by sampling is sent to the extraction frame network to obtain a key frame; and the key frame is sent to the updated graphic convolutional neural network to obtain a final type of the behavior. Therefore, the discriminability of the selectedframe is enhanced; redundant information is removed; the recognition performance is improved; and the calculation amount at the test phase is reduced. Besides, with full utilization of the topologicalrelationship of the human skeletons, the performance of the behavior recognition is improved.
Owner:TSINGHUA UNIV

Mobile robot path planning method with combination of depth automatic encoder and Q-learning algorithm

ActiveCN105137967AAchieve cognitionImprove the ability to process imagesBiological neural network modelsPosition/course control in two dimensionsAlgorithmReward value
The invention provides a mobile robot path planning method with combination of a depth automatic encoder and a Q-learning algorithm. The method comprises a depth automatic encoder part, a BP neural network part and a reinforced learning part. The depth automatic encoder part mainly adopts the depth automatic encoder to process images of an environment in which a robot is positioned so that the characteristics of the image data are acquired, and a foundation is laid for subsequent environment cognition. The BP neural network part is mainly for realizing fitting of reward values and the image characteristic data so that combination of the depth automatic encoder and the reinforced learning can be realized. According to the Q-learning algorithm, knowledge is obtained in an action-evaluation environment via interactive learning with the environment, and an action scheme is improved to be suitable for the environment to achieve the desired purpose. The robot interacts with the environment to realize autonomous learning, and finally a feasible path from a start point to a terminal point can be found. System image processing capacity can be enhanced, and environment cognition can be realized via combination of the depth automatic encoder and the BP neural network.
Owner:BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products