Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

1194 results about "Reinforcement learning algorithm" patented technology

Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start from a blank slate,...

Artificial intelligent training platform for intelligent networking vehicle plan decision-making module

The invention, which relates to the technical field of an intelligent vehicle automatic driving and traffic simulation, relates to an artificial intelligent training platform for an intelligent networking vehicle plan decision-making module and aims at improving the intelligent level of the intelligent vehicle plan decision-making module based on enriched and vivid traffic scenes. The artificial intelligent training platform comprises a simulation environment layer, a data transmission layer, and a plan decision-making layer. The simulation environment layer is used for generating a true traffic scene based on a traffic simulation module and simulating sensing and reaction situations to the environment by an intelligent vehicle, thereby realizing multi-scene loading. The plan decision-making layer outputs a decision-making behavior of the intelligent vehicle by using environment sensing information as an input based on a deep reinforcement learning algorithm, thereby realizing training optimization of network parameters. And the data transmission layer connects the traffic environment module with a deep reinforcement learning frame based on a TCP / IP protocol to realize transmission of sensing information and vehicle control information between the simulated environment layer and the plan decision-making layer.
Owner:TONGJI UNIV

Dynamic beam scheduling method based on deep reinforcement learning

ActiveCN108966352ASpecific beam scheduling actionsWith online learning functionRadio transmissionWireless communicationNetwork packetReinforcement learning algorithm
The invention provides a dynamic beam scheduling method based on deep reinforcement learning, which belongs to the field of multi-beam satellite communication systems. The dynamic beam scheduling method comprises the steps of: firstly, modeling a dynamic beam scheduling problem into a Markov decision process, wherein states of each time slot comprise a data matrix, a delay matrix and a channel capacity matrix in a satellite buffer, actions represent a dynamic beam scheduling strategy, and a target is the long-term reduction of accumulated waiting delay of all data packets; and secondly, solving a best action strategy by utilizing a deep reinforcement learning algorithm, establishing a Q network of a CNN+DNN structure, training the Q network, using the trained Q network to make action decisions, and acquiring the best action strategy. According to the dynamic beam scheduling method, a satellite directly outputs a current beam scheduling result according to the environment state at the moment through a large amount of autonomous learning, maximizes the overall performance of the system in the long term, and greatly reduces the transmission waiting delay of the data packets while keeping the system throughput almost unchanged.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Unmanned vehicle path planning method based on improved A * algorithm and deep reinforcement learning

The invention belongs to the technical field of unmanned vehicle navigation, particularly relates to an unmanned vehicle path planning method based on an improved A * algorithm and deep reinforcementlearning. The method aims to give full play to the advantages of global optimization of global path planning and real-time obstacle avoidance of local planning, improve the rapid real-time performanceof an A * algorithm and the complex environment adaptability of a deep reinforcement learning algorithm, and rapidly plan a collision-free optimal path of an unmanned vehicle from a starting point toa target point. The planning method comprises the following steps: establishing an initialized grid cost map according to environmental information; planning a global path by using an improved A * algorithm; designing a sliding window based on the global path and the performance of the laser radar sensor, and taking the information detected by the window as the state input of the network; on thebasis of a deep reinforcement learning method, using an Actor-Critic architecture for designing a local planning network. According to the invention, knowledge and a data method are combined, an optimal path can be obtained through rapid planning, and the unmanned vehicle has higher autonomy.
Owner:江苏泰州港核心港区投资有限公司

Micro-power-grid energy storage scheduling method and device based on deep Q-value network (DQN) reinforcement learning

ActiveCN109347149ASolved the estimation problemStrong estimation abilitySingle network parallel feeding arrangementsAc network load balancingDecompositionPower grid
The invention discloses a micro-power-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning. A micro-power-grid model is established; a deep Q-value network reinforcement learning algorithm is utilized for artificial intelligence training according to the micro-power-grid model; and a battery running strategy of micro-power-grid energy storage scheduling is calculated and obtained according to input parameter feature values. According to the embodiment of the invention, deep Q-value networks are utilized for scheduling management on micro-power-grid energy, an agent decides the optimal energy storage scheduling strategy through interaction with an environment, a running mode of the battery is controlled in the constantly changing environment,features of energy storage management are dynamically determined on the basis of a micro-power-grid, and the micro-power-grid is enabled to obtain a maximum running benefit in interaction with a mainpower grid; and the networks are enabled to respectively calculate an evaluation value of the environment and an additional value, which is brought by action, through using a competitive Q-value network model, decomposition of the two parts enables a learning objective to be more stable and accurate, and estimation ability of the deep Q-value networks on environment status is enabled to be higher.
Owner:STATE GRID HENAN ELECTRIC POWER ELECTRIC POWER SCI RES INST +3

Computing resource allocation and task unloading method for edge computing of super-dense network

A computing resource allocation and task unloading method for super-dense network edge computing comprises the following steps: step 1, establishing a system model based on a super-dense network edgecomputing network of an SDN, and obtaining network parameters; step 2, obtaining parameters required by edge calculation: unloading the parameters to an edge server of a macro base station and an edgeserver connected with a small base station s through local calculation in sequence to obtain an uplink data rate of a transmission calculation task; step 3, adopting a Q-learning scheme to obtain anoptimal computing resource allocation and task unloading strategy; and step 4, adopting a DQN scheme to obtain an optimal computing resource allocation and task unloading strategy. The method is suitable for a dynamic system by stimulating an intelligent agent to find an optimal solution on the basis of learning variables. In a reinforcement learning (RL) algorithm, Q-Learning has good performancein some time-varying networks. A deep learning technology is combined with Q-learning, a learning scheme based on a deep Q network (DQN) is provided, the benefits of mobile equipment and an operatorare optimized at the same time in a time-varying environment, and compared with a method based on Q-learning, the method is shorter in learning time and faster in convergence. The method realizes simultaneous optimization of benefits of mobile devices (MDs) and operators in a time-varying environment based on the DQN.
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Adaptive learning control method of piezoelectric ceramics driver

InactiveCN103853046ASolving Hysteretic Nonlinear ProblemsHigh repeat positioning accuracyAdaptive controlHysteresisActuator
The invention relates to an adaptive learning control method of a piezoelectric ceramics driver. The adaptive learning control method of the piezoelectric ceramics driver comprises the following steps of (1), building a dynamic hysteretic model of the piezoelectric ceramics driver and designing a control method with the artificial neural network and a PID combined, (2), adopting a reinforcement learning algorithm to achieve adaptive setting of PID parameters on line, (3), adopting a three-layer radial basis function network to approach a strategic function of an actuator in the reinforcement learning algorithm and a value function of an evaluator in the reinforcement learning algorithm; (4), inputting a system error, an error first-order difference and an error second-order difference through a first layer of the radial basis function network, (5), achieving mapping of the system state to the three PID parameters through the actuator in the reinforcement learning algorithm, and (6), judging the output of the actuator and generating an error signal through the evaluator in the reinforcement learning algorithm, and updating system parameters through the signal. The adaptive learning control method of the piezoelectric ceramics driver solves the hysteresis nonlinear problem of the piezoelectric ceramics driver, improves the repeated locating accuracy of a piezoelectric ceramics drive platform, and eliminates influence on a system from hysteresis nonlinearity of piezoelectric ceramics.
Owner:GUANGDONG UNIV OF TECH

Method and device for determining driving strategy based on reinforcement learning and rule

The purpose of the present application is to provide a method or a device for determining a driving strategy based on reinforcement learning and rule fusion. The method comprises: based on the drivingparameter information of a vehicle, determining first driving strategy information of the vehicle through a reinforcement learning algorithm; based on the driving parameter information and the driving rule information of the vehicle, performing rationality detection on the first driving strategy information; and determining target driving strategy information of the vehicle based on the detectionresult of the rationality detection. Compared with the prior art, according to the technical scheme of the present application, the first driving strategy information calculated and determined by thereinforcement learning algorithm is constrained by the rule, so that the method for determining the driving strategy provided by the present application is more intelligent than the existing method for implementing vehicle control by using the rule algorithm, or the method for implementing vehicle control by using the reinforcement learning algorithm, and the rationality and stability of the finally determined driving strategy are improved.
Owner:UISEE TECH BEIJING LTD

Intelligent routing decision method based on DDPG reinforcement learning algorithm

ActiveCN110611619AImprove equalization performanceSolve the congestion problem caused by unbalanced traffic distributionData switching networksNeural learning methodsRouting decisionData center
The invention provides an intelligent routing decision method based on reinforcement learning, in particular to an intelligent routing decision method based on a DDPG reinforcement learning algorithm.The method aims at designing an intelligent routing decision by utilizing reinforcement learning, balancing an equivalent path flow load and improving the processing capacity of a network for burst flow, an experience decision mechanism based on a sampling probability is adopted, the probability that experience with poorer performance is selected is higher, and the training efficiency of an algorithm is improved. In addition, noise is added into neural network parameters, system exploration is facilitated, and algorithm performance is improved. The method comprises the following steps: 1) constructing a network topology structure; 2) numbering equivalent paths in the network topology structure G0; (3) constructing a routing decision model based on a DDPG reinforcement learning algorithm,(4) initializing a flow demand matrix DM and an equivalent path flow proportion matrix PM, and (5) carrying out iterative training on the routing decision model based on reinforcement learning, and the method can be used for scenes such as a data center network.
Owner:XIDIAN UNIV +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products