Partner Modeling in Deep MARL for Theory of Mind Tasks

Description: Theory of Mind (ToM) [11,10] is a crucial ability to correctly interact with other agents, especially in cooperative settings in which agents need to align their actions to achieve a common goal. Previous work has thus established the importance of ToM in such settings [3].
We are thus interested in building agents that have Theory of Mind capabilities. One idea is to take inspiration from research on opponent modeling in multi-agent reinforcement learning [6,7,9]. In this line of work, agents try to outcompete others by building and maintaining a model of their partner agents. Building such a model remains a challenge but recent progress in model-based reinforcement learning could hold the key to building models of others in MARL. This work is interested in exploring the line of work on latent imagination for this purpose which achieved state-of-the-art results in many important baselines starting with work on the Dreamer [1], DreamerV2 [2] and now R2I [5] agents. While all three of these methods have so far only been explored in single-agent settings to build a world model, we want to now use them to build partner agent models also. We hypothesize that these models can be used to mentalize (i.e. form a Theory of Mind) about other agents in the environment. This thesis explores this hypothesis by working on the following tasks:
Goal:
- Surveying the literature on opponent modeling (starting from [6,7,8,9])
- Integrate the works of model-based opponent modeling [9] with works on latent imagintion [1,2,5]
- Build a model and algorithm capable of forming a model of other agents
- Test the resulting approach(es) in several cooperative MARL environments (Hanabi [12], Yokai [13], SymmToM [3]) based on the JaxMARL project [5]
- Perform additional analysis.
Supervisor: Constantin Ruhdorfer
Distribution: 20% literature review, 60% implementation, 20% analysis
Requirements: Good knowledge of deep learning and reinforcement learning, strong programming skills in Python and PyTorch and/or Jax, self management skills. The thesis requires to learn Jax along the way, experience in PyTorch will be sufficient to start.
Literature: [1] D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. “Dream to Control: Learning Behaviors by Latent Imagination”. In: International Conference on Learning Representations. 2020.
[2] D. Hafner, T. P. Lillicrap, M. Norouzi, and J. Ba. “Mastering Atari with Discrete World Models”. In: International Conference on Learning Representations. 2021.
[3] M. Sclar, G. Neubig, and Y. Bisk. “Symmetric Machine Theory of Mind”. In: Proceedings of the 39th International Conference on Machine Learning. Ed. by K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato. Vol. 162. Proceedings of Machine Learning Research. PMLR, 17–23 Jul 2022, pp. 19450–19466.
[4] A. Rutherford, B. Ellis, M. Gallici, J. Cook, A. Lupu, G. Ingvarsson, T. Willi, A. Khan, C. S. de Witt, A. Souly, S. Bandyopadhyay, M. Samvelyan, M. Jiang, R. T. Lange, S. Whiteson, B. Lacerda, N. Hawes, T. Rocktaschel, C. Lu, and J. N. Foerster. JaxMARL: Multi-Agent RL Environments in JAX. 2023. arXiv: 2311.10090 [cs.LG].
[5] M. R. Samsami, A. Zholus, J. Rajendran, and S. Chandar. “Mastering Memory Tasks with World Models”. In: The Twelfth International Conference on Learning Representations. 2024.
[6] Raileanu, R., Denton, E., Szlam, A., & Fergus, R. (2018, July). Modeling others using oneself in multi-agent reinforcement learning. In _International conference on machine learning_ (pp. 4257-4266). PMLR.
[7] Yu, X., Jiang, J., Zhang, W., Jiang, H., & Lu, Z. (2022). Model-based opponent modeling. _Advances in Neural Information Processing Systems_, _35_, 28208-28221.
[8] Chandrasekaran, A., Yadav, D., Chattopadhyay, P., Prabhu, V., & Parikh, D. (2017). _It takes two to tango: Towards theory of AI’s mind_. doi:10.48550/ARXIV.1704.00717
[9] X. Yu, J. Jiang, W. Zhang, H. Jiang, and Z. Lu. Model-based opponent modeling. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NeurIPS ’22, Red Hook, NY, USA, 2022. Curran Associates Inc. ISBN 9781713871088.
[10] Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. A., & Botvinick, M. (2018, July). Machine theory of mind. In _International conference on machine learning_ (pp. 4218-4227). PMLR.
[11] Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind?. _Behavioral and brain sciences_, _1_(4), 515-526.
[12] Bard, N., Foerster, J. N., Chandar, S., Burch, N., Lanctot, M., Song, H. F., ... & Bowling, M. (2020). The hanabi challenge: A new frontier for ai research. _Artificial Intelligence_, _280_, 103216.
[13] Fernandez, J., Longin, D., Lorini, E., & Maris, F. (2023). A logical modeling of the Yōkai board game. _AI Communications. The European Journal on Artificial Intelligence_, 1–34. doi:10.3233/aic-230050