Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication

NeurIPS 2024 Paper
1University of Pittsburgh,2Honda Research Institute USA,3Carnegie Mellon University
MY ALT TEXT

LangGround is a novel computational pipeline for MARL agents to learn human-interpretable communication in ad-hoc human-agent teamwork.

Abstract

Multi-Agent Reinforcement Learning (MARL) methods have shown promise in enabling agents to learn a shared communication protocol from scratch and accomplish challenging team tasks. However, the learned language is usually not interpretable to humans or other agents not co-trained together, limiting its applicability in ad-hoc teamwork scenarios. In this work, we propose a novel computational pipeline that aligns the communication space between MARL agents with an embedding space of human natural language by grounding agent communications on synthetic data generated by embodied Large Language Models (LLMs) in interactive teamwork scenarios. Our results demonstrate that introducing language grounding not only maintains task performance but also accelerates the emergence of communication. Furthermore, the learned communication protocols exhibit zero-shot generalization capabilities in ad-hoc teamwork scenarios with unseen teammates and novel task states. This work presents a significant step toward enabling effective communication and collaboration between artificial agents and humans in real-world teamwork settings.

Demo

Task Performance

We first consider if LangGround allows MARL agents to complete collaborative tasks successfully and converge to a shared communication protocol quickly. LangGround enables multi-agent teams to achieve on-par performance in comparison with SOTA multi-agent communication methods. Introducing language grounds as an auxiliary learning objective does not compromise the task utility of learned communication protocols while providing interpretability.

Baselines vs. RGB

Langauge Alignment

Then we analyze the properties of aligned communication space to show how close the learned language is to the target human natural language. The reference message accurately refers to the agent observation, indicating the learned communication space is semantically meaningful and highly aligned with the target embedding space.

Baselines vs. RGB

Ad-hoc Teamwork

Finally, we evaluate the performance of LangGround agents in ad-hoc teams with 2 unseen LLM agents as teammates. LangGround outperforms baseline methods in two out of three evaluation scenarios.

Visualize most dist patch

Video Presentation

BibTeX

@misc{li2024languagegroundedmultiagentcommunication,
      title={Language Grounded Multi-agent Communication for Ad-hoc Teamwork},
      author={Huao Li and Hossein Nourkhiz Mahjoub and Behdad Chalaki and Vaishnav Tadiparthi and Kwonjoon Lee and Ehsan Moradi-Pari and Charles Michael Lewis and Katia P Sycara},
      year={2024},
      eprint={2409.17348},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2409.17348},
}