Multimodal Learning for Human Pose Forecasting

Description: Human pose forecasting is an active area of research and has significant importance for a variety of applications including assistive devices, human-computer interaction, autonomous driving, as well as VR/AR applications. However, despite the fact that the correlations between human pose and other modalities such as intention objects have been revealed by plenty of works, existing methods mainly focused on features from past pose (i.e. forecast future pose solely based on past pose) and ignored the influences from other modalities. Therefore, it would be very interesting and meaningful to explore the effectiveness of other modalities and utilise multimodal features to forecast human pose.

Goal: Explore the effectiveness of different modalities on the task of human pose forecasting. Develop deep learning methods to forecast human pose from multimodal features.

Supervisor: Zhiming Hu

Distribution: 60% Implementation, 20% Literature, 20% Analysis

Requirements: Good knowledge of deep learning, strong programming skills in Python and PyTorch. Preferable: knowledge of multimodal learning.

Literature: Jaegle, A., et al. 2022. Perceiver io: A general architecture for structured inputs & outputs. International Conference on Learning Representations (ICLR).

Zheng, Y., et al. 2022. GIMO: Gaze-Informed Human Motion Prediction in Context. European Conference on Computer Vision (ECCV).

Martinez, J., et al. 2017. On human motion prediction using recurrent neural networks. IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR).

Ma, T., et al. 2022. Progressively generating better initial guesses towards next stages for high-quality human motion prediction. IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR).

Li, M., et al. 2020. Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR).

Corona, E., et al. 2020. Context-aware human motion prediction. IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR).

Multimodal Learning for Human Pose Forecasting

Links

Contact Us