Το work with title Deep reinforcement learning reward function design for lane-free autonomous driving by Karalakou Athanasia is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
Athanasia Karalakou, "Deep reinforcement learning reward function design for lane-free autonomous driving", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2022
https://doi.org/10.26233/heallink.tuc.92889
Lane-free traffic is a novel and challenging research domain, in which vehicles do not adhere to the notion of lanes, but are rather able to be located at any lateral position within the road boundaries. This constitutes an entirely different problem domain for autonomous driving compared to lane-based traffic, as vehicles consider the entirety of the two-dimensional space available, and their decision-making needs to adapt to this concept. There is no leader vehicle or lane-changing operation to adjacent lanes, therefore the observations of the vehicles need to properly accommodate the lane-free environment without carrying over bias from lane-based approaches. In addition, each vehicle wishes to maintain a (different) desired speed, therefore creating many situations in which vehicles need to perform overtaking and react appropriately to the behavior of others. At the same time, Deep Reinforcement Learning (DRL) has already been used in a variety of applications, while the fact that it can handle high dimensional state and action spaces makes it suitable for controlling autonomous vehicles. Existing studies, however, have not employed Reinforcement Learning (deep or otherwise) in the lane-free traffic domain.Against this background, this diploma thesis initiates the study of the application of (Deep) Reinforcement Learning to lane-free traffic environments. To this end, we put forward a Markov Decision Process formulation for the problem of Lane-Free Autonomous Driving, by addressing all its elements. We consider the two-dimensional continuous action space, along with a discretized form, as well as the state space. Our main focus is on designing an effective reward function, as the reward model is crucial and determines the overall efficiency of the resulting policy.Specifically, we construct different components of reward functions tied to the environment at various levels of information. Then, we combine and collate the aforementioned components and focus on attaining a reward function that results in a policy that manages to both reduce the collisions among vehicles, and also address their requirement of maintaining a desired speed. Additionally, we study the performance of two quite popular DRL algorithms---namely Deep Q-Networks (enhanced with some commonly used extensions), and Deep Deterministic Policy Gradient (DDPG). Our experimental results indicate that DDPG has an overall better performance, and confirm that our DRL-employing autonomous vehicles are able to gradually learn effective policies in environments with varying levels of difficulty.