MultiMediate ’25: Cross-cultural Multi-domain Engagement Estimation
Daksitha Senel Withanage Don, Marius Funk, Michal Balazia, Huajian Qiu, Shogo Okada, François Brémond, Jan Alexandersson, Andreas Bulling, Elisabeth André, Philipp Müller
Proc. of the 33rd ACM International Conference on Multimedia (MM), pp. 14150–14155, 2025.
Abstract
Estimating momentary conversational engagement is central to assistive, socially aware AI systems, yet models are typically trained and evaluated within a single domain, limiting real-world robustness. The MultiMediate ’25 challenge advances engagement estimation to more challenging, cross-cultural, and multi-domain settings. Building on prior challenge editions, we expand beyond NOXI as the sole training source by introducing NOXI-J, a new multilingual corpus covering Japanese and Chinese interactions, enabling both training and evaluation in diverse linguistic contexts. Although NOXI-J conceptually extends NOXI, we treat it as a distinct domain because linguistic, cultural, capture, and annotation differences induce measurable distribution shifts. In this paper, we present new annotations, precomputed multi-modal features (visual, vocal, and verbal), baseline evaluations, and an analysis of the best performing challenge solutions. Beyond accuracy, we quantify fairness using Conditional Demographic Disparity for gender and language. Our baselines confirm strong in-domain performance (e.g., paralinguistic eGeMAPS and video-transformer features) and reveal notable cross-domain drops, underscoring the challenge of cultural, linguistic, and interactional shifts. Fairness analyses indicate generally small discrepancies for our baselines. We observe the largest disparities for the proposed challenge solutions on the Chinese language test set. All annotations, features, code, and leaderboards are made publicly available to foster sustained progress on robust and fair engagement estimation.Links
Paper: withanage25_mm.pdf
BibTeX
@inproceedings{withanage25_mm,
title = {{{MultiMediate}} '25: {{Cross-cultural Multi-domain Engagement Estimation}}},
shorttitle = {{{MultiMediate}} '25},
booktitle = {Proc. of the 33rd {{ACM International Conference}} on {{Multimedia}} (MM)},
author = {Withanage Don, Daksitha Senel and Funk, Marius and Balazia, Michal and Qiu, Huajian and Okada, Shogo and Brémond, François and Alexandersson, Jan and Bulling, Andreas and André, Elisabeth and Müller, Philipp},
year = {2025},
series = {{{MM}} '25},
pages = {14150--14155},
publisher = {Association for Computing Machinery (ACM)},
doi = {10.1145/3746027.3762076}
}