Cleaning robot optimal target path planning method based on model learning

A technology for cleaning robots and target paths, applied in the field of optimal target path planning for cleaning robots, to achieve the effects of improved model learning efficiency, improved speed and accuracy, and strong applicability

Active Publication Date: 2016-07-06
海博(苏州)机器人科技有限公司
View PDF5 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] However, the ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cleaning robot optimal target path planning method based on model learning
  • Cleaning robot optimal target path planning method based on model learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0037] see figure 1 As shown, the black border is a wall, and the robot cannot reach it; the two R points are places with less garbage, and the reward is 0 when the goal is reached; the point G is the place with more garbage, and the reward is 50 when the goal is reached; The remaining grid rewards are -1.

[0038] refer to figure 2 As shown, this embodiment is based on the model learning cleaning robot optimal target path planning method, including the following steps:

[0039] Step 1) Initialize the model, set R(x,u)=R max , f(x,u,x′)=1, where R(x,u) is the reward function, f(x,u,x′) is the state transition function, R max is the maximum reward value, x and u are state-action pairs, and x′ is the next state transferred to after executing x and u;

[0040] Step 2) Initialize the environment, set the starting position of the robot as the grid on the upper left of the map;

[0041] Step 3) Judging the current exploration completeness Where C(x,u) is the number of times t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cleaning robot optimal target path planning method based on model learning to solve the problem that cleaning robots in the current market are not high in efficiency.A Dyna algorithm based on self-simulation measurement and R-MAX is put forward based on a Dyna-H algorithm, and by means of the route planning method, a robot can be driven to preferentially treat places where rubbish is possibly maximal; an exploration mechanism in an R-MAX algorithm is used based on an intensive learning architecture and the Dyna-H algorithm; on the basis of a state distance measurement method, a Euclidean distance measurement method in Dyna-H is improved through self-simulation measurement, and therefore the learning efficiency of a model is improved.The cleaning robot optimal target path planning method has the advantages that the model learning efficiency is high, the method is suitable for a definite environment and a random environment, the robot can quickly and efficiently obtain an accurate environment model in the complex environment, and an optimal path of the places with maximal rubbish can be planned.

Description

technical field [0001] The invention relates to a reinforcement learning method in machine learning, in particular to a model learning-based optimal target path planning method for a cleaning robot. Background technique [0002] Reinforcement learning (Reinforcement Learning, RL) is a machine learning method that learns the mapping from environmental state to action. Agent chooses actions to act on the environment, changes the state of the environment, migrates to a new state of the environment, and gets feedback signals from the environment. This feedback signal is usually called a reward or reinforcement signal. Agent uses it to strengthen the experience it has learned through a certain algorithm. Its goal is to maximize the cumulative expected reward. [0003] Traditional reinforcement learning methods use the information obtained from the interaction between the Agent and the environment to learn, and continuously update the value function to approach the optimal soluti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/00
CPCG05D1/02G16Z99/00
Inventor 刘全周谊成朱斐
Owner 海博(苏州)机器人科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products