Viewpoint-Conditioned Legible Motion Planning with Imitation and Reinforcement Learning

AI and Manipulation

In robotics, legible motion planning is crucial for effective human-robot interaction and shared autonomy. It involves designing robotic movements that are easily interpretable by humans, promoting intuitive anticipation of a robot's actions. This is essential in settings where robots and humans work closely together, such as in collaborative manufacturing or healthcare.

legible motion pictorial description

Two key learning-based methods for achieving legible motion are reinforcement learning (RL) and imitation learning (IL):

However, both methods have been applied largely ignoring the influence of the observer’s viewpoint, which affects how clearly the robot’s intent is communicated. The research aims to integrate viewpoint conditioning into these learning algorithms to improve the alignment of robotic behavior with human perception.

The project introduces:

The goal is to enhance human-robot collaboration by making robotic actions more predictable and understandable.

System Overview

System Overview diagram

We propose a universal learning-based architecture for legible motion planning that can be applied across both reinforcement learning (RL) and imitation learning (IL) setups, differing primarily in the policy learning pipeline. Our focus is on goal-reaching manipulation tasks where the robot must choose between two objects: a goal and a distraction. To ensure legibility, the robot’s movements must be easily understandable and predictable by humans, indicating which object it aims to reach. Furthermore, the robot’s actions are adjusted based on the human’s perspective of the task environment, whether viewed from above or the side.

In our proposed framework, the policy—trainable via either imitation learning or reinforcement learning—generates legible motions that allow human collaborators to clearly grasp the robot’s intent. At each time step, the policy receives inputs including the state of the robot, the goal, potential distractions that could confuse human collaborators, and the viewpoint of the human collaborators. It then predicts an action in the form of joint torques for the robot to execute. While our focus is on the planning side, all observations except the robot state are specified manually. In practical applications, these observations can be obtained from a perception module to facilitate full shared autonomy.

Technical Approaches

Reinforcement Learning-Based Approach

Reinforcement learning approach

Simulation setup for RL with the xArm robot. We use Mujoco as the physics engine and interface it through Gym for learning algorithms. The blue, green and red dots represent the positions of the end effector, goal and distraction, respectively. In human evaluation, we create videos of the same setup with the side and the top view.

Gym simulationn setup

Imitation Learning-Based Approach

Imitation Learning approach

Toy experimental setup for testing IL model with the xArm robot. The setup shows a reaching task towards the object as goal while legibly avoid the red object as distraction. The model also accommodate to different viewpoint condition as we have the same set up with the top and the side view in human evaluation.

Experimental setup

Results

Reinforcement Learning Results

gym_sideview

Side view of reinforcement learning trajectory.

gym_topview

Top view of reinforcement learning trajectory.

Imitation Learning Results

xarm_sideview

Side view of imitation learning legible trajectory.

xarm topview

Top view of imitation learning legible trajectory.

Evaluation

Since legibility refers to the ability to generate motion sequences that are easily understood by humans, creating a metric that accurately reflects human preferences is challenging. Therefore, it is essential to include human evaluation alongside the metrics used in our RL and IL training processes. A group of 10 participants was asked to watch all 8 videos, pausing when they believed they had identified the robot’s intention, and then reporting which object they thought the robot was targeting. We developed a scoring system to balance these considerations.

Evaluation bar graph

Human-based evaluation results on the recorded legible motions. While IL-trained agent performed competitively well compared to the expert demonstrations, RL-trained one outperformed the others by 15%. Without proper viewpoint conditioning, RL-trained agent obtained a low score.

Conclusion

Future Work

References

Read Full Report Here and Poster Here!!!

Next Project

Multi-Agent Path Finding