Konstantina Mammou, "A Gaze Prediction Model for VR Task-Oriented Environments", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2024
https://doi.org/10.26233/heallink.tuc.101799
Gaze prediction in Virtual Reality (VR) has attracted significant attention due to its potential to enhance user interaction and optimize VR applications, such as gaze-contingent rendering. The dynamic and immersive nature of VR environments presents unique challenges, especially in predicting gaze in task-oriented environments compared to free viewing or static ones. This thesis proposes a model for predicting gaze in such environments, investigating the role and ability of temporal continuity to enable accurate predictions. The proposed model is composed of three key modules. The Image Sequence Module (ISM) utilizes ConvLSTM layers to capture temporal motion features from sequences of frames, while the Gaze Sequence Module (GSM) employs LSTM layers to extract temporal patterns from gaze data. These outputs are combined in the Fusion Module, which integrates information from both ISM and GSM to predict a single gaze point. The OpenNEEDS dataset, offering diverse VR scenarios and gaze recordings, was used for training. Preprocessing steps included frame and gaze point normalization, conversion of 3D gaze vectors to 2D visual angles, outlier removal, and sequence creation to prepare the data for the model. The model was evaluated with metrics such as angular error and recall rate, with the model significantly outperforming baseline methods. However, the runtime performance remains a limitation, indicating the need for optimization for real-time applications. Our work contributes a robust, adaptable, consistent model for gaze prediction in task-oriented VR environments and demonstrates the potential of leveraging temporal continuity for accurate gaze prediction.