The disclosure provides an approach for learning to schedule control fragments for
physics-based virtual character simulations and physical
robot control. Given precomputed tracking controllers, a
simulation application segments the controllers into control fragments and learns a scheduler that selects control fragments at runtime to accomplish a task. In one embodiment, each scheduler may be modeled with a Q-network that maps a high-level representation of the state of the
simulation to a control fragment for execution. In such a case, the deep Q-learning
algorithm applied to learn the Q-network schedulers may be adapted to use a reward function that prefers the original controller sequence and an exploration strategy that gives more chance to in-
sequence control fragments than to out-of-
sequence control fragments. Such a modified Q-learning
algorithm learns schedulers that are capable of following the original controller sequence most of the time while selecting out-of-
sequence control fragments when necessary.