Communication is a fundamental aspect of promoting effective coordination in Cooperative Multi-Agent Reinforcement Learning (MARL). While existing research focuses on enhancing the efficiency of agent communication, it often overlooks the challenges associated with real-world communication scenarios. In reality, communication is often subject to various sources of noise and potential attacks, making the robustness of communication-based policies a crucial and pressing issue that demands further exploration.
To solve the problems, a research team led by Yang Yu from
LAMDA, Nanjing University published their
new research on 15 December 2024 in
Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team posits a robust communication-based policy should be robust to scenarios where every message channel may be perturbed under different degrees at any time, and the ego system trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication, dubbed MA3C, to obtain a robust communication-based policy.
In the research, they model the message adversary training process as a cooperative MARL problem, where each adversary obtains the local state of one message sender, then outputs stochastic actions as message perturbations for each message to be sent to other teammates. For the optimization of the adversary, as there are adversaries coordinating to minimize the ego system's return, they can use any cooperative MARL approach to train the attacker system. Moreover, to alleviate the overfitting problem of using a single attacker, they introduce an attacker population learning paradigm, with which they can obtain a set of attackers with high attacking quality and behavior diversity. The ego system and the attacker are then trained in an alternative way to obtain a robust communication-based policy.
Extensive experiments are conducted on various cooperative multi-agent benchmarks that need communication to coordination, including Hallway, two maps from StarCraft Multi-Agent Challenge (SMAC), a newly created environment Gold Panner (GP), and Traffic Junction. The experimental results show that MA3C outperforms multiple baselines. Other results also show its high generalization ability for various perturbation ranges, and the learned policy can transfer learned robustness ability to new tasks after fine-tuning with a few samples.
Future work on how to develop an autonomous paradigm like curriculum learning to find the communication ability boundary is an invaluable direction, and developing efficient and effective MARL communication methods under open-environment scenarios is challenging but of great value in the future.
DOI:
10.1007/s11704-023-2733-5