Cleaning robot optimal target path planning method based on model learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for cleaning robots and target paths, applied in the field of optimal target path planning for cleaning robots, to achieve the effects of improved model learning efficiency, improved speed and accuracy, and strong applicability

Active Publication Date: 2016-07-06

海博(苏州)机器人科技有限公司

View PDF5 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] However, the Dyna-H algorithm is limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0037] see figure 1 As shown, the black border is a wall, and the robot cannot reach it; the two R points are places with less garbage, and the reward is 0 when the goal is reached; the point G is the place with more garbage, and the reward is 50 when the goal is reached; The remaining grid rewards are -1.

[0038] refer to figure 2 As shown, this embodiment is based on the model learning cleaning robot optimal target path planning method, including the following steps:

[0039] Step 1) Initialize the model, set R(x,u)=R max , f(x,u,x′)=1, where R(x,u) is the reward function, f(x,u,x′) is the state transition function, R max is the maximum reward value, x and u are state-action pairs, and x′ is the next state transferred to after executing x and u;

[0040] Step 2) Initialize the environment, set the starting position of the robot as the grid on the upper left of the map;

[0041] Step 3) Judging the current exploration completeness Where C(x,u) is the number of times t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cleaning robot optimal target path planning method based on model learning to solve the problem that cleaning robots in the current market are not high in efficiency.A Dyna algorithm based on self-simulation measurement and R-MAX is put forward based on a Dyna-H algorithm, and by means of the route planning method, a robot can be driven to preferentially treat places where rubbish is possibly maximal; an exploration mechanism in an R-MAX algorithm is used based on an intensive learning architecture and the Dyna-H algorithm; on the basis of a state distance measurement method, a Euclidean distance measurement method in Dyna-H is improved through self-simulation measurement, and therefore the learning efficiency of a model is improved.The cleaning robot optimal target path planning method has the advantages that the model learning efficiency is high, the method is suitable for a definite environment and a random environment, the robot can quickly and efficiently obtain an accurate environment model in the complex environment, and an optimal path of the places with maximal rubbish can be planned.

Description

technical field [0001] The invention relates to a reinforcement learning method in machine learning, in particular to a model learning-based optimal target path planning method for a cleaning robot. Background technique [0002] Reinforcement learning (Reinforcement Learning, RL) is a machine learning method that learns the mapping from environmental state to action. Agent chooses actions to act on the environment, changes the state of the environment, migrates to a new state of the environment, and gets feedback signals from the environment. This feedback signal is usually called a reward or reinforcement signal. Agent uses it to strengthen the experience it has learned through a certain algorithm. Its goal is to maximize the cumulative expected reward. [0003] Traditional reinforcement learning methods use the information obtained from the interaction between the Agent and the environment to learn, and continuously update the value function to approach the optimal soluti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F19/00

CPCG05D1/02G16Z99/00

Inventor 刘全周谊成朱斐

Owner 海博(苏州)机器人科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cleaning robot optimal target path planning method based on model learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology