Pretrained visual representations in reinforcement learning

Abstract:

Visual reinforcement learning (RL) has made significant progress in recent years, but the choice of visual feature extractor remains a crucial design decision. This paper compares the performance of RL algorithms that train a convolutional neural network (CNN) from scratch with those that utilize pre-trained visual representations (PVRs). We evaluate the Dormant Ratio Minimization (DRM) algorithm, a state-of-the-art visual RL method, against three PVRs: ResNet18, DINOv2, and Visual Cortex (VC). We use the Metaworld Push-v2 and Drawer-Open-v2 tasks for our comparison. Our results show that the choice of training from scratch compared to using PVRs for maximising performance is task-dependent, but PVRs offer advantages in terms of reduced replay buffer size and faster training times. We also identify a strong correlation between the dormant ratio and model performance, highlighting the importance of exploration in visual RL. Our study provides insights into the trade-offs between training from scratch and using PVRs, informing the design of future visual RL algorithms.

Downloads: PDF

BibTeX:

 @inproceedings{williams2024pretrained,
  title = {Pretrained visual representations in reinforcement learning},
  author = {Williams, Emlyn and Polydoros, Athanasios},
  booktitle = {Annual Conference Towards Autonomous Robotic Systems},
  pages = {60--71},
  year = {2024},
  pdflink1 = {https://www.researchgate.net/profile/Athanasios-Polydoros/publication/382528670_Pretrained_Visual_Representations_in_Reinforcement_Learning/links/66a1f2945919b66c9f6885fe/Pretrained-Visual-Representations-in-Reinforcement-Learning.pdf},
  organization = {Springer},
  public = {yes}
}