Prioritized Trajectory Replay with Neuro-Inspired Representation

Description: Reinforcement learning (RL) agents must learn feature representations from raw environmental observations, forming the basis for behavioral policy learning. However, these representations are often task-specific and lack the flexibility seen in human neural spatial representations.
Spatial Semantic Pointers (SSPs), inspired by neural and cognitive mechanisms, offer a promising approach to encode locations and environments in a dynamic way. SSPs have been shown to improve robustness and performance in RL navigation tasks (Bartlett et al., 2023). Additionally, Dumont & Eliasmith (2022) demonstrated the utility of SSPs in spiking neural networks for path integration—the process by which an agent estimates its current position by continuously updating its belief based on its past actions. This allows an agent to track its location without external cues, and can be used to maintain a compressed representation of complete trajectories.
Prioritized Experience Replay (PER; Schaul 2015) is a data sampling technique that selects and replays the most beneficial transitions for learning. Liang et al. (2021) have proposed PTR-PPO: proximal policy optimization with prioritized trajectory replay. They adapted PER for online settings with the popular PPO agent, using a trajectory preference module that samples the most important trajectories. Interestingly, neural signatures of trajectory replay have been observed in the brains of animals and humans (Ólafsdóttir et al., 2018).
This project aims to combine the SSP-based trajectory representations with PTR algorithms for robust RL. The goals are twofold: 1) to adapt PTR for online settings using neuro-cognitive representations of trajectories, and 2) to explore the applicability of PTR in understanding neural replay mechanisms.
Supervisor: Nicole Dumont and Anna Penzkofer
Distribution: 20% literature review, 60% implementation, 20% analysis
Requirements: : programming proficiency in Python, some experience with Reinforcement Learning, good self-management skills.
Literature:
[1] Bartlett, M., Simone, K., Dumont, N. D., Furlong, M., Eliasmith, C., Orchard, J., & Stewart, T (2023). Improving reinforcement learning with biologically motivated continuous state representations. Proceedings of the 21st International Conference on Cognitive Modeling. Paper link.
[2] Dumont, N. S.-Y., Orchard, J., & Eliasmith, C. (2022). A model of path integration that connects neural and symbolic representation. Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44). Paper link.
[3] Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized Experience Replay. arXiv. Paper link.
[4] Liang, X., Ma, Y., Feng, Y., & Liu, Z. (2021). PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay. arXiv. Paper link.
[5] Ólafsdóttir, H. F., Bush, D., & Barry, C. (2018). The Role of Hippocampal Replay in Memory and Planning. Current Biology, 28(1), R37–R50. Paper link.