CAI Logo

PhyRVD: A Diagnostic Dataset for Physical Reasoning in a Video Dialog Setting

Dataset image

Description: This thesis aims at alleviating the dataset scarcity problem for the Video Dialog task by introducing a novel synthetic dataset that follows an adversarial and physically-grounded paradigm. Contrarily to DVD/CATER and similar to CLEVRER, we want to focus on physical reasoning, i.e. enabling AI systems to reason about moving objects that follow the physical laws of motion (e.g. the Newtonian laws, law of conservation of momentum, etc.). We ground the reasoning task within a Video Dialog setting to (1) explore involved and interconnected reasoning chains that only appear in a multi-round conversations and (2) teach the networks to justify their predicted answers in natural language leading to more explainable decision making capabilities.

Dataset image

Supervisor: Adnen Abdessaied

Distribution: 20% Literature, 40% Implementation, 40% Analysis

Requirements: Strong Knowledge in Python and interest in deep learning and machine learning

Literature:Le, Hung, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, Satwik Kottur. 2021. DVD: A diagnostic dataset for multi-step reasoning in video grounded dialogue. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL- IJCNLP).

Yi, Kexin, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum. 2020. CLEVRER: Collision Events for Video Representation and Reasoning. International Conference on Learning Representations (ICLR).