Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

191 results about "Action selection" patented technology

Action selection is a way of characterizing the most basic problem of intelligent systems: what to do next. In artificial intelligence and computational cognitive science, "the action selection problem" is typically associated with intelligent agents and animats—artificial systems that exhibit complex behaviour in an agent environment. The term is also sometimes used in ethology or animal behavior.

Automatic driving system based on enhanced learning and multi-sensor fusion

The invention discloses an automatic driving system based on enhanced learning and multi-sensor fusion. The system comprises a perception system, a control system and an execution system. The perception system high-efficiently processes a laser radar, a camera and a GPS navigator through a deep learning network so as to realize real time identification and understanding of vehicles, pedestrians, lane lines, traffic signs and signal lamps surrounding a running vehicle. Through an enhanced learning technology, the laser radar and a panorama image are matched and fused so as to form a real-time three-dimensional streetscape map and determination of a driving area. The GPS navigator is combined to realize real-time navigation. The control system adopts an enhanced learning network to process information collected by the perception system, and the people, vehicles and objects of the surrounding vehicles are predicted. According to vehicle body state data, the records of driver actions are paired, a current optimal action selection is made, and the execution system is used to complete execution motion. In the invention, laser radar data and a video are fused, and driving area identification and destination path optimal programming are performed.
Owner:清华大学苏州汽车研究院(吴江)

Dynamic spectrum access method based on policy planning constrain Q study

The invention provides a dynamic spectrum access method on the basis that the policy planning restricts Q learning, which comprises the following steps: cognitive users can divide the frequency spectrum state space, and select out the reasonable and legal state space; the state space can be ranked and modularized; each ranked module can finish the Q form initialization operation before finishing the Q learning; each module can individually execute the Q learning algorithm; the algorithm can be selected according to the learning rule and actions; the actions finally adopted by the cognitive users can be obtained by making the strategic decisions by comprehensively considering all the learning modules; whether the selected access frequency spectrum is in conflict with the authorized users is determined; if so, the collision probability is worked out; otherwise, the next step is executed; whether an environmental policy planning knowledge base is changed is determined; if so, the environmental policy planning knowledge base is updated, and the learning Q value is adjusted; the above part steps are repeatedly executed till the learning convergence. The method can improve the whole system performance, and overcome the learning blindness of the intelligent body, enhance the learning efficiency, and speed up the convergence speed.
Owner:COMM ENG COLLEGE SCI & ENGINEEIRNG UNIV PLA

Medical interrogation dialogue system and reinforcement learning method applied to medical interrogation dialogue system

The invention discloses a medical interrogation dialogue system and a reinforcement learning method applied to the medical interrogation dialogue system, and relates to the technical field of medicalinformation. The system comprises a natural language understanding module used for classifying the intentions of users and filling slot values to form structured semantic frames; a dialogue managementmodule used for interacting with a user through a robot agent, inputting a dialogue state, performing action decision on the semantic frame through a decision network, and outputting final system action selection; a user simulator used for carrying out natural language interaction with the dialogue management module and outputting user action selection; a natural language generation module used for receiving system action selection and user action selection, enabling the user to check the selection through generating sentences similar to a human language by using a template-based method. According to the invention, the medical knowledge information between diseases and symptoms is introduced as a guide, and the inquiry historical experience is enriched through continuous interaction witha simulated patient. The reasonability of inquiry symptoms and the accuracy of disease diagnosis are improved, and the diagnosis result is higher in credibility.
Owner:暗物智能科技(广州)有限公司

Unmanned aerial vehicle autonomous air combat decision framework and method

The invention discloses an unmanned aerial vehicle autonomous air combat decision framework and method, and belongs to the field of computer simulation. The framework comprises an air combat decisionmodule, a deep network learning module, an enhanced learning module and an air combat simulation environment which are based on domain knowledge. The air combat decision module generates an air combattraining data set and outputs the air combat training data set to the deep network learning module, and a depth network, a Q value fitting function and a motion selection function are obtained through learning and output to the enhanced learning module; the air combat simulation environment uses the learned air combat decision function to carry out a self-air combat process, and records air combat process data to form an enhanced learning training set; the enhanced learning module is used for optimizing and improving the Q value fitting function by utilizing the enhanced learning training set, and an air combat strategy with better performance is obtained. According to the framework, a Q function which is complex in nature can be more accurately and quickly fitted, the learning effect isimproved, the Q function is prevented from being converged to the local optimum value to the largest extent, an air combat decision optimization closed-loop process is constructed, and external intervention is not needed.
Owner:BEIHANG UNIV

Resource allocation method based on multi-agent reinforcement learning in mobile edge computing system

The invention discloses a resource allocation method based on multi-agent reinforcement learning in a mobile edge computing system, which comprises the following steps: (1) dividing a wireless channelinto a plurality of subcarriers, wherein each user can only select one subcarrier; (2) enabling each user to randomly select a channel and computing resources, and then calculating time delay and energy consumption generated by user unloading; (3) comparing the time delay energy consumption generated by the local calculation of the user with the time delay energy consumption unloaded to the edgecloud, and judging whether the unloading is successful or not; (4) obtaining a reward value of the current unloading action through multi-agent reinforcement learning, and calculating a value function; (5) enabling the user to perform action selection according to the strategy function; and (6) changing the learning rate of the user to update the strategy to obtain an optimal action set. Based onvariable-rate multi-agent reinforcement learning, computing resources and wireless resources of the mobile edge server are fully utilized, and the maximum value of the utility function of each intelligent terminal is obtained while the necessity of user unloading is considered.
Owner:SOUTHEAST UNIV

Intelligent vehicle speed decision-making method based on deep reinforcement learning and simulation method thereof

The invention discloses an intelligent vehicle speed decision-making method based on a deep reinforcement learning method. The method comprises the steps of constructing a state space S, an action space A and an instant rewarding space R of a Markov decision-making model of an intelligent vehicle passing through an intersection; initializing a neural network, and constructing an experience pool; performing action selection by adopting an epsilon-greed algorithm, and filling the experience into the experience pool constructed in the step 2; randomly selecting a part of experience from the experience pool, and training a neural network by adopting a stochastic gradient descent method; completing the speed decision of the intelligent vehicle at the current moment according to the latest neural network, adding the experience to an experience pool, randomly selecting a part of experience, and carrying out the training of a new round of neural network. The invention further discloses a simulation method of the intelligent vehicle speed decision-making method based on deep reinforcement learning. The method is advantaged in that simulation experiments are carried out based on a deep reinforcement learning simulation system established by a matlab automatic driving toolbox.
Owner:JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products