Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

330 results about "Q-learning" patented technology

Q-learning is a model-free reinforcement learning algorithm. The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. It does not require a model (hence the connotation "model-free") of the environment, and it can handle problems with stochastic transitions and rewards, without requiring adaptations.

AUV (Autonomous Underwater Vehicle) three-dimensional path planning method based on reinforcement learning

The invention designs an AUV (Autonomous Underwater Vehicle) three-dimensional path planning method based on reinforcement learning. The AUV three-dimensional path planning method comprises the following steps: firstly, modeling a known underwater working environment, and performing global path planning for an AUV; secondly, designing a bonus value specific to a special working environment and a planning target of the AUV, performing obstacle avoidance training on the AUV by using a Q learning method improved on the basis of a self-organizing neural network, and writing an obstacle avoidance strategy obtained by training into an internal control system of a robot; and finally receiving global path planning nodes after the robot enters into water, calculating a target heading plan by the AUV with the global path planning nodes as target nodes for planning a route, and avoiding obstacles by using the obstacle avoidance strategy in case of emergent obstacles. Through adoption of the method, the economical efficiency of the AUV routing path is ensured, and the security in case of emergent obstacles is ensured. Meanwhile, the route planning accuracy can be improved; the planning time isshortened; and the environmental adaptability of the AUV is enhanced. The method can be applied to the AUV which carriers an obstacle avoidance sonar and can implement autonomous routing.
Owner:HARBIN ENG UNIV

AGV (Automated Guided Vehicle) route planning method and system based on ant colony algorithm and multi-intelligent agent Q learning

The invention discloses an AGV (Automated Guided Vehicle) route planning method and system based on an ant colony algorithm and multi-intelligent agent Q learning, improving the global optimization ability, realizing a case that an AGV learns how to avoid an obstacle in the interaction process by introducing the multi-intelligent agent Q learning into a route planning research of the AGV, and canplay independence and learning capacity of the AGV better. The AGV route planning method and system is characterized in that according to a static environment, carrying out modeling on an AGV operation environment by utilizing a grid method, and setting an initial point and a target point; according to coordinates of the initial point and the target point of the AGV, generating a global optimal route by the ant colony algorithm; enabling the AGV to move towards the target point according to the global optimal route, and when detecting that a dynamic obstacle exists in a minimum distance, carrying out selection of an obstacle avoidance strategy by an environment state corresponding to the multi-intelligent agent Q learning so as to take a corresponding obstacle avoidance action, and after ending obstacle avoidance, returning to an original route to continuously move.
Owner:YTO EXPRESS CO LTD

Resource allocation method for reinforcement learning in ultra-dense network

A resource allocation method for reinforcement learning in an ultra-dense network is provided. The invention relates to the field of ultra-dense networks in 5G (fifth generation) mobile communications and provides a method for allocating resources between a home node B and a macro node B, between a home node B and another home node B and between a home node B and a mobile user in a dense deployment network; the method is implemented through power control, each femotcell is considered as an intelligent body to jointly adjust transmitting powers of home node Bs, the densely deployed home node Bs are avoided causing severe jamming to a macro node B and an adjacent B when transmitting at maximum powder, and system throughput is maximized; user delay QoS is considered, and traditional 'Shannon capacity' is replaced with 'available capacity' that may ensure user delay; a supermodular game model is utilized such that whole network power distribution gains Nash equilibrium; the reinforcement learning method Q-learning is utilized such that the home node B has learning function, and optimal power distribution can be achieved; by using the resource allocation method, it is possible to effectively improve the system capacity of an ultra-dense network at the premise of satisfying user delay.
Owner:BEIJING UNIV OF CHEM TECH

Depth Q learning-based UAV (unmanned aerial vehicle) environment perception and autonomous obstacle avoidance method

The invention belongs to the field of the environment perception and autonomous obstacle avoidance of quadrotor unmanned aerial vehicles and relates to a depth Q learning-based UAV (unmanned aerial vehicle) environment perception and autonomous obstacle avoidance method. The invention aims to reduce resource loss and cost and satisfy the real-time performance, robustness and safety requirements ofthe autonomous obstacle avoidance of an unmanned aerial vehicle. According to the depth Q learning-based UAV (unmanned aerial vehicle) environment perception and autonomous obstacle avoidance methodprovided by the technical schemes of the invention, a radar is utilized to detect a path within a certain distance in front of an unmanned aerial vehicle, so that a distance between the radar and an obstacle and a distance between the radar and a target point are obtained and are adopted as the current states of the unmanned aerial vehicle; during a training process, a neural network is used to simulate a depth learning Q value corresponding to each state-action of the unmanned aerial vehicle; and when a training result gradually converges, a greedy algorithm is used to select an optimal action for the unmanned aerial vehicle under each specific state, and therefore, the autonomous obstacle avoidance of the unmanned aerial vehicle can be realized. The method of the invention is mainly applied to unmanned aerial vehicle environment perception and autonomous obstacle avoidance control conditions.
Owner:TIANJIN UNIV

Machine workshop task scheduling energy-saving optimization system based on reinforcement learning

The invention discloses a machine workshop task scheduling energy-saving optimization system based on reinforcement learning. The machine workshop task scheduling energy-saving optimization system comprises a scheduling target function module, a basic database, a scheduling rule library, a primary scheduling execution module, an energy-saving optimization rule library, an energy-saving optimization module and a scheduling scheme library. Firstly, scheduling targets are set in the scheduling target function module, and primary scheduling is executed in the primary scheduling execution module by selecting scheduling rules from the scheduling rule library and utilizing basic data of the basic database to obtain primary scheduling schemes; then, energy-saving optimization experiences of experts are typed in from an outside-system knowledge expanding port to obtain energy-saving optimization rules; meanwhile, a learning environment is built in the energy-saving optimization module and comprises a workshop energy-consumption module, a processing time module and a scheduling scheme evaluation model; finally, an energy-saving optimization strategy is obtained for the primary scheduling schemes through interaction between a Q learning controller and the learning environment. The machine workshop task scheduling energy-saving optimization system can reduce energy consumption in a machine workshop production process by means of scheduling and has great significance for energy saving and emission reduction of machine workshops.
Owner:CHONGQING UNIV

Cascade reservoir random optimization scheduling method based on deep Q learning

PendingCN110930016ASolve the fundamental instability problem of the approximationEffectively deal with the "curse of dimensionality" problemForecastingDesign optimisation/simulationAlgorithmTransition probability matrix
The invention discloses a cascade reservoir random optimization scheduling method based on deep Q learning. The method comprises the following steps: describing the reservoir diameter process of a reservoir; establishing a Markov decision process MDPS model; establishing a probability transfer matrix; establishing a cascade reservoir random optimization scheduling model; determining a constraint function of the model: introducing a deep neural network, extracting runoff state characteristics of the cascade reservoir, Meanwhile, realizing approximate representation and optimization of a targetvalue function of the scheduling model; applying reinforcement learning to reservoir random optimization scheduling; establishing a DQN model; and solving the cascade reservoir stochastic optimizationscheduling model by adopting a deep reinforcement learning algorithm. According to the cascade reservoir stochastic optimization scheduling method based on deep Q learning, cascade reservoir stochastic optimization scheduling is realized, so that the generator set is fully utilized in the scheduling period, the power demand and various constraint conditions are met, and the annual average power generation income is maximum.
Owner:CHINA THREE GORGES UNIV

Stability control method for four-wheel independent driving electric automobile based on Q-learning

The invention discloses a stability control method for a four-wheel independent driving electric automobile based on Q-learning. The control method comprises the following steps of selecting corresponding optimal control parameters based on actual external conditions, and calculating and obtaining an ideal control torque by utilizing the parameters; calculating slip form control parameters K of a yaw torque controller under different external conditions, and storing different external conditions and the slip form control parameters K which are in the yaw torque controller and correspond to the different external conditions, to a stability control system; and reasonably distributing the calculated ideal control torque to four wheels. According to the stability control method disclosed by the invention, the control parameters required by online calculation are found by the Q-learning manner, and stored to the yaw torque controller, so that during working, the yaw torque controller of the four-wheel independent driving electric automobile can directly call the control parameters in a manner of table look-up, the calculating time is greatly shortened, and the real-time capability, the robust ability and the practicability of the stability control system of the four-wheel independent driving electric automobile are improved.
Owner:DALIAN UNIV OF TECH

Distributed formation method of unmanned aerial vehicle cluster based on reinforcement learning

The invention discloses a distributed formation method of an unmanned aerial vehicle cluster based on reinforcement learning. The distributed formation method comprises the steps that step (1), a formation target state function and a simulation model of environmental uncertainty factors are obtained, and an unmanned aerial vehicle formation simulation model is established; step (2), under the interference of the environmental uncertainty factors, based on the unmanned aerial vehicle formation simulation model established in the step (1), a Q learning method is adopted to train the unmanned aerial vehicle cluster to update a flight strategy table; step (3), the value of the completion degree of the formation target state is calculated according to the obtained formation target state function, the obtained value of the completion degree of the formation target state is compared with a preset value of the formation target state, whether the formation target state is reached or not is judged according to the comparison results, if the formation target state is reached, a step (4) is performed, and if not, the step (2) is entered; and step (4), the updated flight strategy table is saved. According to the distributed formation method of the unmanned aerial vehicle cluster based on reinforcement learning, flight strategy parameters with adaptability are provided for the cluster, and the stability and robustness of the unmanned aerial vehicle cluster formation are guaranteed.
Owner:XIDIAN UNIV

Energy efficiency-oriented multi-agent deep reinforcement learning optimization method for unmanned aerial vehicle group

The invention discloses an energy efficiency-oriented multi-agent deep reinforcement learning optimization method for an unmanned aerial vehicle group. The method comprises the steps of adopting an improved DQN deep reinforcement learning method based on Q learning, training and updating the neural network of each intelligent agent by using historical information of the unmanned aerial vehicle cluster to obtain channel selection and power selection decisions of each intelligent agent of the unmanned aerial vehicle cluster, training the neural network by using a short-time experience playback mechanism in the training process, and maximizing the energy efficiency value of the corresponding intelligent agent by using the optimization target of each neural network. According to the invention,a distributed multi-agent deep strong chemical method is adopted, and a short-time experience playback mechanism is set to train a neural network to mine a change rule contained in a dynamic networkenvironment. The problem that a convergence solution cannot be obtained in a large state space faced by traditional reinforcement learning is solved. Multi-agent distributed cooperative learning is achieved, the energy efficiency of unmanned aerial vehicle cluster communication is improved, the life cycle of an unmanned aerial vehicle cluster is prolonged, and the dynamic adaptive capacity of an unmanned aerial vehicle cluster communication network is enhanced.
Owner:YANGTZE NORMAL UNIVERSITY

Q-learning based vehicular ad hoc network routing method

The invention relates to a Q-learning based vehicular ad hoc network routing method and belongs to the technical field of Internet-of-things communication. The method includes that (1) a GPS (global positioning system) is loaded to each vehicle in a network, and the vehicles acquire neighbor node information by passing Hello messages therebetween; (2) a city region is divided into equal grids, the position of each grid represents a different state, and transferring from one grid to the adjacent grid represents an action; (3) a Q-value table is learnt; (4) parameters are set; (5) routing strategies QGrid_G and QGrid_M are selected. Vehicles newly added into the network acquire the Q-value table obtained by offline learning from the neighbor vehicles, and the vehicles can be informed of the optimal next-hop grid of message passing by querying the Q-value table of the message destination grid. The grid sequence that the vehicles mostly frequently travel is taken into consideration from a macroscopic point of view, the vehicle which is mostly likely to arrive at the optimal next-hop grid is selected by considering from a microcosmic point of view, and passing success rate of messages in the urban traffic network is increased effectively by the macroscopic and microcosmic combination mode.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Multi-intersection self-adaptive phase difference coordinated-control system and method based on Q learning

The invention discloses a multi-intersection self-adaptive phase difference coordinated-control system and method based on Q learning. The system comprises an intersection control module, a coordinated-control module, a Q learning control module, a regulation module and an output execution module. The intersection control module is used for providing a reasonable single-intersection traffic timingplan for a current phase according the traffic state of a local interaction. The coordinated-control module is used for judging whether the current phase needs phase difference coordination or not byanalyzing the traffic states of local interaction and adjacent intersections. The multi-intersection self-adaptive phase difference coordinated-control method can effectively shorten the response time for traffic jam, quickly coordinate signal control of the intersections and improve the intersection passing efficiency and has very good universality in the application of traffic signal self-adaptive control. The system can give accurate and reasonable green lamp timing plans through phase coordination and is more suitable for large-scale intersections having larger vehicle flow compared withcoordination control free of accurate time.
Owner:NANJING UNIV OF POSTS & TELECOMM

Question analyzing method and system for interactive question answering

The invention discloses a question analyzing method and system for interactive question answering. The method includes the steps of S1, determining question information, preprocessing the question information and extracting multiple pieces of feature information in the question information; S2, extracting a plurality of keywords in the question information according to the feature information, anddetermining a plurality of properties corresponding to the keywords; S3, determining whether the semantic meaning of the question information is complete or not in combination with the properties, ifnot, executing the step S4, otherwise executing the step S5; S4, restoring the semantic meaning of the question information in combination with a historical conversation and based on a deep learningenhancing method, using the restored question information as new question information and executing the step S1 to S3; S5, conducting template matching on the question information according to the properties and dividing the question information according to types. The deep Q-Learning method based on deep learning enhancing is added, and therefore the semantic meaning restoration accuracy is improved, and the industrial application requirements are well met.
Owner:HUAZHONG UNIV OF SCI & TECH

Method for planning paths on basis of ocean current prediction models

The invention belongs to the field of underwater robot control, and discloses a method for planning paths on the basis of ocean current prediction models. The method includes steps of carrying out rasterization processing on sailing regions according to path critical points; predicting ocean current for the sailing regions by the aid of regional ocean modes and acquiring real-time ocean current information by means of fitting computation; marking prohibited areas by the aid of electronic ocean map information; storing prohibition information of different depths and starting point and end pointlocation information according to plain grids at different depths, and storing longitudes and latitudes of various points of the grids, whether the points are the prohibited areas or not and whetherend points are reached or not; computing the directions from current locations to the end points and determining optional actions in travel directions at all next steps; seeking the optimal strategiesby the aid of Q-learning and outputting the paths. The optimal strategies are planned in Markov decision-making processes. The method has the advantages that influence of the real-time ocean currenton path planning is sufficiently considered, fitting is carried out by the aid of BP (back propagation) neural networks and bagging algorithms, the optimal solution can be sought by means of reinforcement learning, accordingly, the convergence speeds can be increased, and the computational complexity can be lowered.
Owner:HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products