Data-efficient hierarchical reinforcement learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a multi-level hierarchical reinforcement learning model and offline policy correction technology, the problems of complex multi-level reasoning and resource waste in robot control tasks are solved, and efficient training and robot control in complex environments are achieved.

CN117549293BActive Publication Date: 2026-06-12GOOGLE LLC

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: GOOGLE LLC
Filing Date: 2019-05-17
Publication Date: 2026-06-12

Application Information

Patent Timeline

17 May 2019

Application

12 Jun 2026

Publication

CN117549293B

IPC: B25J9/16; G06N3/092; G06N3/008

CPC: G06N3/092; G06N3/008; B25J9/16; B25J9/1664; B25J9/1697; B25J9/1602; B25J9/1656; G05B2219/39289

AI Tagging

Application Domain

Programme-controlled manipulator Programme control

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Pre-queueing of order totes
US20260159316A1
Robotic teleoperation system and method based on visual positioning and virtual reality technology
CN117584123BRelieve stressEnhanced presenceProgramme-controlled manipulator Image enhancement
Determining a configuration of an articulated structure
US20260158653A1Programme-controlled manipulator Character and pattern recognition Classical mechanics Mechanical engineering
Method and system for robot grasping for zero-shot shape reconstruction empowerment
CN122185154AProgramme-controlled manipulator Image analysis
Systems, apparatus, and methods for robots to learn and perform skills
JP7872768B2Programme-controlled manipulator Programme control

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN117549293B_ABST

Patent Text Reader

Abstract

Hierarchical reinforcement learning (HRL) models are trained and / or utilized with robotic control. The HRL models can include at least a higher-level policy model and a lower-level policy model. Some implementations involve techniques that enable more efficient offline policy training in the training of the higher-level policy model and / or the lower-level policy model. Some of these implementations utilize a correction of the offline policy that re-labels the higher-level actions of experience data that was generated in the past with a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then used to train the higher-level policy model offline. This can enable efficient offline policy training despite the lower-level policy model being a different version (relative to the version at the time the experience data was collected) at the time of training.

Need to check novelty before this filing date? Find Prior Art

Claims

1. A method implemented by one or more processors, the method comprising: Identify the robot's current state by observation; A higher-level policy model using a hierarchical reinforcement learning model is used to determine higher-level actions for transitioning from the current state observation to the target state observation. Atomic actions are generated by processing the current state observation and the higher-level action using a lower-level policy model that employs the hierarchical reinforcement learning model. The atomic actions are applied to the robot to transition it to a newer state. An intrinsic reward is generated for the atomic action, the intrinsic reward being generated based on the updated state and the target state observation; and The lower-level policy model is trained based on the intrinsic reward of the atomic action.

2. The method according to claim 1, further comprising: Following the training, the hierarchical reinforcement learning model is used to control one or more actuators of the attached robot.

3. The method according to claim 1 or claim 2, wherein, The robot in question is a simulated robot.

4. The method according to claim 1 or claim 2, wherein, Generating the intrinsic reward based on the updated state and the target state observation includes generating the intrinsic reward based on the L2 difference between the updated state and the target state observation.

5. The method according to claim 1 or claim 2, further comprising generating an environmental reward and training the higher-level policy model based on the environmental reward.

6. A method implemented by one or more processors of a robot, the method comprising: Identify the current state of the robot; In the first control step, a higher-level policy model using a hierarchical reinforcement learning model is used to determine a higher-level action for transitioning from the current state to the target state. Based on the processing of the current state and the higher-level action using the lower-level policy model of the hierarchical reinforcement learning model, a first lower-level action is generated for the first control step; The first lower-level action is applied to the robot so that the robot transitions to the updated state; In the second control step following the first control step, an updated higher-level action is generated, wherein generating the updated higher-level action includes at least applying the current state, the updated state, and the higher-level action to the transition function; Based on processing the updated state using the lower-level policy model and the updated higher-level action, a second lower-level action is generated for the second control step; The second lower-level action is applied to the robot so that the robot transitions to a further updated state.

7. A robot comprising one or more processors for performing the method of claim 6.

8. A computer-readable storage medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform the method according to any one of claims 1 to 6.

9. A robot control system comprising one or more processors for performing the method according to any one of claims 1 to 6.