Reinforcement learning for autonomous unmanned aerial vehicles

Geramanis Nikolaos

Full record

URI:

http://purl.tuc.gr/dl/dias/EEF7B227-2A2B-4BB0-8ED1-579574DE4D69

Year

2020

Type of Item

Diploma Work

License

Details

Bibliographic Citation

Nikolaos Geramanis, "Reinforcement learning for autonomous unmanned aerial vehicles", Diploma Work, School of Electrical and Computer Engineering, Technical Univesity of Crete, Chania, Greece, 2020 https://doi.org/10.26233/heallink.tuc.87066

Appears in Collections

Diploma Works in Community School of Electrical and Computer Engineering

Diploma Works in Community Intelligent Systems Laboratory

Summary

Reinforcement learning is an area of machine learning concerned with how autonomous agents learn to behave in unknown environments through trial-and-error. The goal of a reinforcement learning agent is to learn a sequential decision policy that maximizes the notion of cumulative reward through continuous interaction with the unknown environment. A challenging problem in robotics is the autonomous navigation of an Unmanned Aerial Vehicle (UAV) in worlds with no available map. This ability is critical in many applications, such as search and rescue operations or the mapping of geographical areas. In this thesis, we present a map-less approach for the autonomous, safe navigation of a UAV in unknown environments using reinforcement learning. Specifically, we implemented two popular algorithms, SARSA(λ) and Least-Squares Policy Iteration (LSPI), and combined them with tile coding, a parametric, linear approximation architecture for value function in order to deal with the 5- or 3-dimensional continuous state space defined by the measurements of the UAV distance sensors. The final policy of each algorithm, learned over only 500 episodes, was tested in unknown environments more complex than the one used for training in order to evaluate the behavior of each policy. Results show that SARSA(λ) was able to learn a near-optimal policy that performed adequately even in unknown situations, leading the UAV along paths free-of-collisions with obstacles. LSPI's policy required less learning time and its performance was promising, but not as effective, as it occasionally leads to collisions in unknown situations. The whole project was implemented using the Robot Operating System (ROS) framework and the Gazebo robot simulation environment.

Search

Browse

My Space

Reinforcement learning for autonomous unmanned aerial vehicles

Geramanis Nikolaos

Summary

Available Files

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: