Multimodal Learning for Human Eyes and Human Motion

Description: Human eyes and human motion are two important human behaviours/modalities and have great relevance for a variety of applications in computer vision and human-computer interaction. More specifically, human eye gaze signals can be used for gaze-based interactions, gaze-contingent rendering, as well as gaze-based activity recognition. Human motion information is significant for assistive devices, autonomous driving, as well as virtual and augmented reality. In our daily activities, human eyes and human motion are strongly correlated, i.e. they have significant influences on each other. Recently, multimodal learning approaches have demonstrated superior performance on exploring the links between different modalities. Therefore, it would be very interesting to explore the correlations between human eyes and human motion using multimodal learning methods.
Goal:
- Estimate eye gaze from human motion
- Forecast eye gaze in the future from past eye gaze and human motion
- Estimate human motion from eye gaze
- Forecast human motion in the future from past motion and eye gaze
Supervisor: Zhiming Hu
Distribution: 70% Implementation, 10% Literature, 20% Analysis
Requirements: Good knowledge of deep learning, strong programming skills in Python and PyTorch. Preferable: Knowledge of multimodal learning.
Literature: Kratzer, P., et al. 2020. MoGaze: A dataset of full-body motions that includes workspace geometry and eye-gaze. IEEE Robotics and Automation Letters, 6(2), p.367-373.
Hu, Z., et al. (2020). "DGaze: CNN-Based Gaze Prediction in Dynamic Scenes." IEEE Transactions on Visualization and Computer Graphics 26(5): 1902-1911.
Hu, Z. (2021). "FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments." IEEE Transactions on Visualization and Computer Graphics.
Hu, Z., et al. (2021). "EHTask: Recognizing User Tasks from Eye and Head Movements in Immersive Virtual Reality." IEEE Transactions on Visualization and Computer Graphics.
Ma, T., et al. (2022). Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.