Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

183 results about "Reward value" patented technology

Value of Rewards is a Total Reward, Remuneration, Compensation & Benefits activity by René Broekhuis.

Mobile robot path planning method with combination of depth automatic encoder and Q-learning algorithm

ActiveCN105137967AAchieve cognitionImprove the ability to process imagesBiological neural network modelsPosition/course control in two dimensionsAlgorithmReward value
The invention provides a mobile robot path planning method with combination of a depth automatic encoder and a Q-learning algorithm. The method comprises a depth automatic encoder part, a BP neural network part and a reinforced learning part. The depth automatic encoder part mainly adopts the depth automatic encoder to process images of an environment in which a robot is positioned so that the characteristics of the image data are acquired, and a foundation is laid for subsequent environment cognition. The BP neural network part is mainly for realizing fitting of reward values and the image characteristic data so that combination of the depth automatic encoder and the reinforced learning can be realized. According to the Q-learning algorithm, knowledge is obtained in an action-evaluation environment via interactive learning with the environment, and an action scheme is improved to be suitable for the environment to achieve the desired purpose. The robot interacts with the environment to realize autonomous learning, and finally a feasible path from a start point to a terminal point can be found. System image processing capacity can be enhanced, and environment cognition can be realized via combination of the depth automatic encoder and the BP neural network.
Owner:BEIJING UNIV OF TECH

Dialog strategy online realization method based on multi-task learning

The invention discloses a dialog strategy online realization method based on multi-task learning. According to the method, corpus information of a man-machine dialog is acquired in real time, current user state features and user action features are extracted, and construction is performed to obtain training input; then a single accumulated reward value in a dialog strategy learning process is split into a dialog round number reward value and a dialog success reward value to serve as training annotations, and two different value models are optimized at the same time through the multi-task learning technology in an online training process; and finally the two reward values are merged, and a dialog strategy is updated. Through the method, a learning reinforcement framework is adopted, dialog strategy optimization is performed through online learning, it is not needed to manually design rules and strategies according to domains, and the method can adapt to domain information structures with different degrees of complexity and data of different scales; and an original optimal single accumulated reward value task is split, simultaneous optimization is performed by use of multi-task learning, therefore, a better network structure is learned, and the variance in the training process is lowered.
Owner:AISPEECH CO LTD

Resource allocation method based on multi-agent reinforcement learning in mobile edge computing system

The invention discloses a resource allocation method based on multi-agent reinforcement learning in a mobile edge computing system, which comprises the following steps: (1) dividing a wireless channelinto a plurality of subcarriers, wherein each user can only select one subcarrier; (2) enabling each user to randomly select a channel and computing resources, and then calculating time delay and energy consumption generated by user unloading; (3) comparing the time delay energy consumption generated by the local calculation of the user with the time delay energy consumption unloaded to the edgecloud, and judging whether the unloading is successful or not; (4) obtaining a reward value of the current unloading action through multi-agent reinforcement learning, and calculating a value function; (5) enabling the user to perform action selection according to the strategy function; and (6) changing the learning rate of the user to update the strategy to obtain an optimal action set. Based onvariable-rate multi-agent reinforcement learning, computing resources and wireless resources of the mobile edge server are fully utilized, and the maximum value of the utility function of each intelligent terminal is obtained while the necessity of user unloading is considered.
Owner:SOUTHEAST UNIV

Robot motion decision-making method, system and device introducing emotion regulation and control mechanism

The invention belongs to the field of intelligent robots, particularly relates to a robot motion decision-making method, system and device introducing an emotion regulation and control mechanism, andaims to solve the problems of robot decision-making speed and learning efficiency. The method comprises the following steps: generating a predicted state value of a next moment according to a currentaction variable and a state value by utilizing an environmental perception model; updating state-based on action variables, state values, immediate rewards An action value function network; obtaininga prediction track based on an environmental perception model, calculating a local optimal solution of the prediction track, carrying out differential dynamic programming, and obtaining an optimal decision based on the model; acquiring a model-free decision based on a current state and strategy as well as minimized state-motion functions and based on the state prediction error, the reward prediction error and the average reward value, generating an emotion response signal through an emotion processing computable model, and selecting a path decision according to a threshold value of the signal.The decision-making speed is gradually increased while learning efficiency is ensured.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Neural network reinforcement learning control method of autonomous underwater robot

The invention provides a neural network reinforcement learning control method of an autonomous underwater robot. The neural network reinforcement learning control method of the autonomous underwater robot comprises the steps that current pose information of an autonomous underwater vehicle (AUV) is obtained; quantity of a state is calculated, the state is input into a reinforcement learning neuralnetwork to calculate a Q value in a forward propagation mode, and parameters of a controller are calculated by selecting an action A; the control parameters and control deviation are input into the controller, and control output is calculated; the autonomous robot performs thrust allocation according to executing mechanism arrangement; and a reward value is calculated through control response, reinforcement learning iteration is carried out, and reinforcement learning neural network parameters are updated. According to the neural network reinforcement learning control method of the autonomousunderwater robot, a reinforcement learning thought and a traditional control method are combined, so that the AUV judges the self motion performance in navigation, the self controller performance isadjusted online according to experiences generated in the motion, a complex environment is adapted faster through self-learning, and thus, better control precision and control stability are obtained.
Owner:HARBIN ENG UNIV

C-RAN calculation unloading and resource allocation method based on deep reinforcement learning

The invention discloses a C-RAN calculation unloading and resource allocation method based on deep reinforcement learning in the technical field of mobile communication. The method comprises the following steps: 1) firstly constructing a deep reinforcement learning neural network; calculating a task data size and calculation resources required for executing a task; 2) inputting the system state into a deep reinforcement learning model; performing neural network training, and obtaining system actions, 3) enabling the user to unload the calculation task according to the unloading proportionalitycoefficient; enabling the mobile edge computing server to execute a computing task according to the computing resource allocation coefficient; obtaining a reward value of the system action accordingto the reward function; updating neural network parameters according to the reward value; and 4) repeating the above steps until the reward value tends to be stable, completing the training process, and unloading the user computing task and allocating the computing resources of the MEC server according to the final system action. The method can greatly reduce the user service time and energy consumption, so that the real-time low-energy-consumption service becomes possible.
Owner:NANJING UNIV OF POSTS & TELECOMM

Simulation text medical record generation method and system

The invention provides a simulation text medical record generation method and system. The original medical record is applied to generate positive samples, the word vector and the disease tag vector outputted by the generator in each cycle act as inputs, a new word vector is outputted and a sentence composed of multiple word vectors is generated by repetition for many times. As each word vector isgenerated, the generated word vector sequence is taken as the initial state, generator sampling is repeatedly operated to generate multiple sentences, and the discriminator takes an average of the reward values of all the sentence as the reward value of the present word vector, the generator is updated according to the obtained reward values of the sentences and the word vector and the process isrepeated until convergence. The convergent generator generates negative samples and the negative samples and the positive samples form a mixed medical record data set. The disease tag vector and the word vector sequence act as the input, the probability of each medical record coming from the real medical record is obtained and the discriminator is updated and the process is repeated until convergence. The patient privacy is involved and the simulated text medical record can assist other machine learning tasks so as to facilitate the research on the disease.
Owner:TSINGHUA UNIV

An industrial data generation method based on a neural network model

The invention provides an industrial data generation method based on a neural network model, and the method comprises the following steps of generating a large-scale data set by taking a time sequencegeneration process as a continuous decision-making process through an identification feedback mechanism based on the idea of a generative adversarial network; wherein the discriminator extracts timesequence characteristics and evaluates the importance of each characteristic to the sequence, and the quality of the generated data is measured by training a real sample and a generated sample, and the discriminator feeds back a corresponding reward value of a data probability generated in each step in the generator through time difference learning, the generator based on the LSTM is trained by astrategy gradient of reinforcement learning, and the reward value is provided by a return value of the discriminator. According to the industrial data generation method based on the neural network model, a new mode of generating big data from small data is realized, so that the mining analysis effect of data mining in a big data environment is improved, and the whole framework has the reliable performance and the excellent expandability.
Owner:CHINA UNIV OF PETROLEUM (EAST CHINA)

Image fine-grained recognition method based on reinforcement learning strategy

The invention provides a fine-grained recognition method based on reinforcement learning and cross bilinear features, aiming at solving the problem that an area with the best discrimination capabilityof a fine-grained image is difficult to mine. An actor-Critic strategy is used to mine the most attention-grabbing areas of an image. An Actor module is responsible for generating top M candidate areas with the best discrimination capability. A Critic module evaluates the state value of the action by utilizing the cross bilinear characteristic; and then calculates a reward value of the action under the current state by utilizing a sorting one-type reward, further obtains a value advantage, feeds the value advantage back to the Actor module, updates the output of the region with the most attention, and finally predicts the fine-grained category by using the region with the most discrimination capability in combination with the original image characteristics. According to the method, the region with the most attention of the fine-grained image can be better mined. It is verified by experiments that the recognition accuracy of the present invention on the CUB-200-2011 public data set isimproved compared with the existing methods, and the high fine-grain recognition accuracy rate is achieved respectively.
Owner:SOUTHEAST UNIV

Stock investment method based on weighted dense connection convolution neural network deep learning

The invention relates to a stock investment method based on weighted dense connection convolution neural network deep learning. According to the method, feature extraction is conducted on input stockdata through weighted dense connection convolution, different initial weight values are endowed with dynamic adjustment weight values in the training process through cross-layer connection and featurepatterns of different layers, the feature patterns are more effectively used, information flow between all layers in the network is increased, and the problem that the layer is too deep and thus convergence of gradient disappearance results in the training process is difficult is solved to some extent. Through the Q value output by the weighted dense connection convolution network, appropriate stock trading action is selected, a corresponding reward value is obtained, the reward value and states are stored in a experience pool, at the time of training, batch sampling is randomly conducted inthe experience pool, and the weighted dense connection convolution neural network is used for approaching a Q value function of the Q-learning algorithm. By directly learning the environmental factorsof a stock market, a trading decision is directly given.
Owner:NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products