Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

233 results about "Q learning algorithm" patented technology

Q-learning, is a simple incremental algorithm developed from the theory of dynamic programming [Ross,1983] for delayed reinforcement learning. In Q-learning, policies and the value function are represented by a two-dimensional lookup table indexed by state-action pairs. Formally, for each state and action let:

Path planning Q-learning initial method of mobile robot

The invention discloses a reinforcing learning initial method of a mobile robot based on an artificial potential field and relates to a path planning Q-learning initial method of the mobile robot. The working environment of the robot is virtualized to an artificial potential field. The potential values of all the states are confirmed by utilizing priori knowledge, so that the potential value of an obstacle area is zero, and a target point has the biggest potential value of the whole field; and at the moment, the potential value of each state of the artificial potential field stands for the biggest cumulative return obtained by following the best strategy of the corresponding state. Then a Q initial value is defined to the sum of the instant return of the current state and the maximum equivalent cumulative return of the following state. Known environmental information is mapped to a Q function initial value by the artificial potential field so as to integrate the priori knowledge into a learning system of the robot, so that the learning ability of the robot is improved in the reinforcing learning initial stage. Compared with the traditional Q-learning algorithm, the reinforcing learning initial method can efficiently improve the learning efficiency in the initial stage and speed up the algorithm convergence speed, and the algorithm convergence process is more stable.
Owner:山东大学(威海)

Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning

The invention provides a mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning. According to the mobile robot path planning algorithm based on the single-chain sequential backtracking Q-learning, a two-dimensional environment is expressed by using a grid method, each environment area block corresponds to a discrete location, the state of a mobile robot at some moment is expressed by an environment location where the robot is located, the search of each step of the mobile robot is based on a Q-learning iterative formula of a non-deterministic Markov decision process, progressively sequential backtracking is carried out from the Q value of the tail end of a single chain, namely the current state, to the Q value of the head end of the single chain until a target state is reached, the mobile robot cyclically and repeatedly finds out paths to the target state from an original state, the search of each step is carried out according to the steps, and Q values of states are continuously iterated and optimized until the Q values are converged. The mobile robot path planning algorithm based on the single-chain sequential backtracking Q-learning has the advantages that the number of steps required for optimal path searching is far less than that of a classic Q-learning algorithm and a Q(lambda) algorithm, the learning time is shorter, and the learning efficiency is higher; and particularly for large environments, the mobile robot path planning algorithm based on the single-chain sequential backtracking Q-learning has more obvious advantages.
Owner:SHANDONG UNIV

Mobile robot path planning method with combination of depth automatic encoder and Q-learning algorithm

ActiveCN105137967AAchieve cognitionImprove the ability to process imagesBiological neural network modelsPosition/course control in two dimensionsAlgorithmReward value
The invention provides a mobile robot path planning method with combination of a depth automatic encoder and a Q-learning algorithm. The method comprises a depth automatic encoder part, a BP neural network part and a reinforced learning part. The depth automatic encoder part mainly adopts the depth automatic encoder to process images of an environment in which a robot is positioned so that the characteristics of the image data are acquired, and a foundation is laid for subsequent environment cognition. The BP neural network part is mainly for realizing fitting of reward values and the image characteristic data so that combination of the depth automatic encoder and the reinforced learning can be realized. According to the Q-learning algorithm, knowledge is obtained in an action-evaluation environment via interactive learning with the environment, and an action scheme is improved to be suitable for the environment to achieve the desired purpose. The robot interacts with the environment to realize autonomous learning, and finally a feasible path from a start point to a terminal point can be found. System image processing capacity can be enhanced, and environment cognition can be realized via combination of the depth automatic encoder and the BP neural network.
Owner:BEIJING UNIV OF TECH

Social network-based vehicle-mounted self-organization network routing method

The invention discloses a social network-based vehicle-mounted self-organization network routing method and belongs to the technical field of a vehicle-mounted wireless network. The method comprises the steps of (1) utilizing neighbor node information to calculate the direction angles and the effective values of nodes; (2) adopting a greedy algorithm added with a cache mechanism for the nodes on a road section, wherein intersection nodes adopt the neighbor nodes with the maximum effective values larger than those of the current nodes in an angle threshold value range as the next-hop transmission relay; (3) enabling vehicle nodes to study from the self history transmission actions by a Q learning algorithm assisted by a routing algorithm, wherein the nodes select the neighbor nods enabling a reward function to achieve the maximum convergence value as the next-hop transponder. The complexity of the routing algorithm is reduced, the system cost is reduced, and the Q learning algorithm is used for assisting the routing selecting, so the data packets are enabled to be transmitted along the path with the minimum hop number, and the time delay is reduced; the delivery rate of the data packets is improved and the end-to-end time delay and the consumption of system resources are reduced.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Dynamic spectrum access method based on policy planning constrain Q study

The invention provides a dynamic spectrum access method on the basis that the policy planning restricts Q learning, which comprises the following steps: cognitive users can divide the frequency spectrum state space, and select out the reasonable and legal state space; the state space can be ranked and modularized; each ranked module can finish the Q form initialization operation before finishing the Q learning; each module can individually execute the Q learning algorithm; the algorithm can be selected according to the learning rule and actions; the actions finally adopted by the cognitive users can be obtained by making the strategic decisions by comprehensively considering all the learning modules; whether the selected access frequency spectrum is in conflict with the authorized users is determined; if so, the collision probability is worked out; otherwise, the next step is executed; whether an environmental policy planning knowledge base is changed is determined; if so, the environmental policy planning knowledge base is updated, and the learning Q value is adjusted; the above part steps are repeatedly executed till the learning convergence. The method can improve the whole system performance, and overcome the learning blindness of the intelligent body, enhance the learning efficiency, and speed up the convergence speed.
Owner:COMM ENG COLLEGE SCI & ENGINEEIRNG UNIV PLA

Method for planning paths of unmanned aerial vehicles on basis of Q(lambda) algorithms

ActiveCN109655066AGive full play to the flying abilitySolve the shortcomings of the lack of basis in the discretization processNavigational calculation instrumentsPosition/course control in three dimensionsDecision modelEnvironmental modelling
The invention provides a method for planning tasks of unmanned aerial vehicles on the basis of Q(lambda) algorithms. The method includes a step of carrying out environment modeling, a step of initializing Markov decision process models, a step of carrying out Q(lambda) algorithm iterative computation and a step of computing the optimal paths according to state value functions. The method particularly includes initializing grid spaces according to the minimum flight path section lengths of the unmanned aerial vehicles, mapping coordinates of the grid spaces to obtain airway points and representing circular and polygonal threat regions; building Markov decision models, to be more specific, representing flight action spaces of the unmanned aerial vehicles, designing state transition probability and constructing reward functions; carrying out iterative computation on the basis of constructed models by the aid of the Q(lambda) algorithms; computing each optimal path of the corresponding unmanned aerial vehicle according to the ultimate convergent state value functions. The unmanned aerial vehicles can safely avoid the threat regions via the optimal paths computed according to the ultimate convergent state value functions. The method has the advantages that the traditional Q learning algorithms and effectiveness tracking are combined with one another, accordingly, the value functionconvergence speeds can be increased, the value function convergence precision can be enhanced, and the unmanned aerial vehicles can be guided to avoid the threat regions and autonomously plan paths.
Owner:NANJING UNIV OF POSTS & TELECOMM

Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system

InactiveCN105139072AAdvantages RapidityReduce riskNeural learning methodsHidden layerAlgorithm
The invention discloses a reinforcement learning algorithm, including a new Q learning algorithm. The new Q learning algorithm includes the implementation steps of: inputting collected data to a BP neural network, and calculating input and output of each unit of a hidden layer and an output layer in the state; calculating a maximum output value m in a t state, based on the output, judging whether a collision with a barrier occurs, if a collision occurs, recording each unit threshold value and each connection weight of the BP neural network, and otherwise calculating T+1 moment, collecting data and performing normalization, calculating input and output of each unit of the hidden layer and the output layer in the t+1 state, calculating an expected output value of a t state, adjusting output and the threshold value of each unit of the hidden layer, judging whether an error is smaller than a given threshold value or the number of times of learning is larger than a given value, if the condition is not satisfied, performing learning again, and otherwise recording the threshold value of each unit and each connection weight, finishing learning. The reinforcement learning algorithm provided by the invention has good real-time performance and good rapidity, and allows relearning in a later period.
Owner:DONGHUA UNIV

Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band

The invention relates to a Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in an unauthorized frequency band, and belongs to the technical field of wireless communication. With the aid of a Q learning algorithm, the transmission time of an LTE-U system in a next transmission period T is obtained according to the total throughput of the LTE-U system and a Wi-Fi system in a current transmission period T, and thus, expected network throughput is achieved, and the coexistence performance of the LTE-U system and the Wi-Fi system is improved. The method starts from the perspective of improving the overall network throughput, and supposes that data is transmitted via the Wi-Fi system all the time within T. The LTE-U system determines the transmission time ratio thereof within T according to the result of the Q learning algorithm, and then records the overall network throughput in the current T under the transmission time ratio to provide a basis for the selection of transmission ratio in next Q learning algorithm. The method fully considers the situation in which the throughput of the Wi-Fi system sharply declines due to transmission interference of the LTE-U system when the LTE-U system and the Wi-Fi system coexist, and can be applied to a user moving scene under the premise of coexistence of the LTE-U system and the Wi-Fi system.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Cognitive radio space-frequency two-dimensional anti-hostility jamming method based on deep reinforcement learning

The invention discloses a cognitive radio space-frequency two-dimensional anti-hostility jamming method based on deep reinforcement learning. A cognitive radio secondary user observes an access state of a cognitive radio primary user and a signal to jamming ratio of a wireless signal under a state of unknowing a jammer attack mode and a wireless channel environment, and decides whether to leave the located interfered region or select an appropriate frequency point to send the signal by use of a deep reinforcement learning mechanism. A deep convolutional nerve network and Q learning are combined, the Q learning is used for learning an optimal anti-jamming strategy in a wireless dynamic game, and an observation state and acquired benefit are input into the deep convolutional nerve network as a training set to accelerate the learning speed. By use of the deep reinforcement learning mechanism, the communication efficiency for competing hostility jammer by the cognitive radio under a wireless network environment scene in dynamic change is improved. A problem that the learning speed is fast reduced since an artificial nerve network needs to firstly classify the data in the training process and the Q learning algorithm is large in dimension in a state set and an action set can be overcome.
Owner:XIAMEN UNIV

Path planning method based on multi-agent enhanced learning

The invention discloses a path planning method based on multi-agent enhanced learning, and belongs to the technical field of aircrafts. Firstly, a global state division model of an air flight environment is established, and a global state transfer table Q-Table 1 is initialized, and the global state of a certain line is randomly selected to be used as the initial state s1; in all columns of the current sate s1, an epsilon-greedy algorithm is adopted to select a certain column to be recorded as a behavior a1; based on the selected behavior a1, the next state, a formula which is as shown in thespecification, of the current state s1 is obtained in the global state transfer table Q-Table 1; the specific element values corresponding to the current state s1 and the behavior a1 are updated in the global state transfer table Q-Table 1 according to a transfer rule of a Q-Learning algorithm; a formula which is as shown in the specification is updated to enter an inner layer cycle; and a local planning path corresponding to the updated state s1 is obtained by adopting the Q-Learning algorithm. The number of iterations of the outer-layer cycle is increased by 1 until N1, and global path planning of the aircraft in the air is completed. The aircraft can meet requirements of different environments, so that the survival rate and the task completion rate of the aircraft are improved, and theconvergence speed of the enhanced learning is improved.
Owner:BEIHANG UNIV

Internet of Vehicles transmission method and device based on information freshness

The embodiment of the invention provides an Internet of Vehicles transmission method and device based on information freshness, applied to the technical field of the Internet of Vehicles. The method comprises the following steps: obtaining an update objective function according to an initial objective function and the fact that information age of the vehicle collection node is a fixed value, whenvehicle collection node does not send a data packet within a preset time period T; constraining the network by minimizing the change rate of a Lyapunov function to obtain a final objective function, calculating an upper limit of the final final objective function at T=1, determining a second-best solution of the minimum final objective function by minimizing the upper limit to obtain an update strategy function and a routing algorithm function, determining the update strategy according to the update strategy function, determining a measurement factor according to the Q-learning algorithm andthe routing algorithm function, performing, by the vehicle collection node, information update decision making according to the update strategy, and selecting a neighbor node with the maximum measurement factor to transmit the data packet. By adoption of the Internet of Vehicles transmission method and device provided by the invention, the computational complexity and the information age can be reduced.
Owner:BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products