Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

53 results about "Strategic learning" patented technology

Dialog strategy online realization method based on multi-task learning

The invention discloses a dialog strategy online realization method based on multi-task learning. According to the method, corpus information of a man-machine dialog is acquired in real time, current user state features and user action features are extracted, and construction is performed to obtain training input; then a single accumulated reward value in a dialog strategy learning process is split into a dialog round number reward value and a dialog success reward value to serve as training annotations, and two different value models are optimized at the same time through the multi-task learning technology in an online training process; and finally the two reward values are merged, and a dialog strategy is updated. Through the method, a learning reinforcement framework is adopted, dialog strategy optimization is performed through online learning, it is not needed to manually design rules and strategies according to domains, and the method can adapt to domain information structures with different degrees of complexity and data of different scales; and an original optimal single accumulated reward value task is split, simultaneous optimization is performed by use of multi-task learning, therefore, a better network structure is learned, and the variance in the training process is lowered.
Owner:AISPEECH CO LTD

Multi-agent action strategy learning method and device, medium and computing equipment

The embodiment of the invention provides a multi-agent action strategy learning method. The multi-agent action strategy learning method comprises the steps that multiple agents sample corresponding actions according to respective initial action strategies; respectively estimating the advantages obtained after the multiple agents execute the corresponding actions; and updating the action strategy of each intelligent agent based on the advantages obtained after the multiple intelligent agents execute the corresponding actions, so that each updated action strategy can enable the corresponding intelligent agent to obtain higher return. The method provided by the invention is applied to a task processing-oriented machine learning scene; meanwhile, a plurality of cooperative intelligent agents are trained (namely a plurality of action strategies are trained at the same time). A pre-built simulator and the intelligent agents are not adopted for interaction, manual supervision is not needed, time cost and resources are greatly saved, in addition, in order to enable all the intelligent agents to learn excellent action strategies, different awards are distributed to all the intelligent agents, and therefore the multiple intelligent agents can learn the more excellent action strategies.
Owner:TSINGHUA UNIV

Multi-vehicle collaborative planning method based on distributed crowd-sourcing learning

The invention discloses a multi-vehicle collaborative planning method based on crowd-sourcing learning, and belongs to the technical field of multi-vehicle-road collaborative decision making. According to the invention, the edge server is utilized to reduce the requirements of the computing capability and the communication capability of the vehicle; the evolutionary game is used for modeling the process of continuous game between vehicles in routing planning, and when the game state forms a stable situation, each vehicle obtains a routing decision with maximum own benefit; an intersection passing driving decision-making module is deployed on each vehicle, the vehicle is regarded as an independent decision-making individual, and a cooperative driving behavior of multiple vehicles at the intersection is modeled by using the powerful strategy learning capability of deep reinforcement learning; a traffic situation prediction module is calculated and deployed at the roadside edge, and the traffic situation perception under the limited visual field of vehicles is expanded by using the communication capability of multiple vehicles and roads. According to the invention, different aspects of road resources are optimized, space-time utilization of the intersection is optimized, space-time utilization of road resources around the intersection is optimized, and throughput of the intersection is increased.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Object access strategy configuration method and device

The invention relates to an object access strategy configuration method and device. The method comprises the following steps: sending a strategy acquisition instruction to the first terminal, whereinthe strategy obtaining instruction is used for indicating the first terminal to obtain a target control strategy from the trusted management server, wherein the target control strategy is used for indicating a control operation performed on a behavior of accessing the target object, the target control strategy is generated through a first strategy learning process executed on the second terminal,and the first strategy learning process is a process of learning a first access log of the target object on the second terminal; under the condition that a strategy obtaining request sent by the firstterminal is received, sending a target control strategy to the first terminal in response to the strategy obtaining request; and receiving strategy effective information sent by the first terminal, the strategy effective information being used for indicating that the target control strategy is confirmed to be effective on the first terminal. According to the application, the technical problem that the configuration efficiency of the object access strategy is relatively low in related technologies is solved.
Owner:BEIJING KEXIN HUATAI INFORMATION TECH

Ceph system performance optimization strategy and system based on deep reinforcement learning

The invention discloses a Ceph system performance optimization system based on deep reinforcement learning. The Ceph system performance optimization system is composed of a data source module, a dataaccess mode learning module, an evaluation mechanism learning module and a system parameter adjustment learning module. The Ceph system performance optimization strategy based on deep reinforcement learning is realized through the following steps: S1, preprocessing a data source; s2, learning and classifying a Ceph file system running environment model; s3, carrying out evaluation mechanism learning; and S4, learning a Ceph file system parameter adjustment strategy. According to the data access method, deep reinforcement learning algorithm and interactive learning of an A2C model and a Ceph file system are combined to obtain the optimized parameters, and optimal system parameters adapted to data access mode may be selected; the method can adapt to different data access modes and hardware configurations, the optimal system parameters are obtained through intelligent learning, the system parameters can be obtained according to the optimal system parameters, and therefore the performanceof the Ceph file system is improved.
Owner:STATE GRID ANHUI ELECTRIC POWER +1

Task-oriented dialogue strategy generation method

The invention relates to a task-oriented dialogue strategy generation method, and the method comprises the following steps: establishing a dialogue state tracker, and determining a dialogue state space, an action space and formalized representation thereof; simulating a dialogue state by using a variational automatic encoder; simulating a dialogue action by using a multi-layer perceptron and Gumbel Softmax; performing adversarial training on a simulation sample generator and a discriminator; and finally training a dialogue strategy by using a reinforcement learning method. Firstly, a simulation sample generator is used for learning a reward function, and loss from a discriminator can be directly fed back to the generator for optimization; secondly, the trained discriminator is taken as a dialogue reward to be brought into a reinforcement learning process for guiding dialogue strategy learning; the dialogue strategy can be updated by utilizing any reinforcement learning algorithm; according to the method, common information contained in high-quality dialogues generated by human beings can be deduced by distinguishing the dialogues generated by the human beings and the machine respectively, and then the learned information is fully utilized to guide dialogue strategy learning in a new field in a transfer learning mode.
Owner:网经科技(苏州)有限公司

Large-scale unmanned aerial vehicle cluster flight method based on deep reinforcement learning

The invention discloses a large-scale unmanned aerial vehicle cluster flight method based on deep reinforcement learning, and the method comprises the steps: dividing a learning process of an unmanned aerial vehicle cluster anti-collision strategy into a plurality of courses in sequence, and enabling the unmanned aerial vehicle cluster scale of the next course to be larger than the unmanned aerial vehicle cluster scale of the previous course; constructing a curriculum reinforcement learning framework based on an actuator network and an evaluator network, and setting a group constant network based on an attention mechanism in the curriculum reinforcement learning framework; sequentially carrying out strategy learning on each course according to the course reinforcement learning framework to obtain a flight strategy of each unmanned aerial vehicle; and according to the empirical data of each unmanned aerial vehicle in the previous course of the current course, the executor network parameters and the evaluator network parameters of the current course in the strategy learning process are updated. According to the invention, the learning and training efficiency of the large-scale unmanned aerial vehicle can be effectively improved, the collision of the large-scale unmanned aerial vehicle cluster during flight is effectively avoided, and the generalization ability is strong.
Owner:NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products