Visual localization in unstructured environments through deep learning

Petrakis Georgios

Full record

URI:

http://purl.tuc.gr/dl/dias/518179D9-9064-427B-A97C-7086D7CC626C

Year

2023

Type of Item

Doctoral Dissertation

License

Details

Bibliographic Citation

Georgios Petrakis, "Visual localization in unstructured environments through deep learning", Doctoral Dissertation, School of Mineral Resources Engineering, Technical University of Crete, Chania, Greece, 2023 https://doi.org/10.26233/heallink.tuc.97894

Appears in Collections

Doctoral Dissertations in Community School of Mineral Resources Engineering

Summary

Scene understanding, localization and mapping, play a crucial role in computer vision, robotics and geomatics, providing valuable knowledge through a vast and increasing number of methodologies and applications. However, although the literature flourishes with related studies in urban and indoor environments, far fewer studies concentrate in unstructured environments.The main goal of this dissertation is to design and develop a visual localization framework based on deep learning that aims to enhance scene understanding and the potential of autonomous navigation in challenging unstructured scenes and develop a precise positioning methodology, for characteristic point localization in GNSS-denied environments. The dissertation can be divided in five different parts: (a) design of the training and evaluation datasets, (b) implementation and improvement of a keypoint detection and description neural network for unstructured environments (c) implementation and development of a lightweight neural network for visual localization focused on unstructured environments and integration of the trained model in a SLAM (Simultaneous Localization and Mapping) system as a feature extraction module (d) development of a lightweight encoder-decoder architecture for lunar ground segmentation (e) development of a precise positioning and mapping alternative for GNSS-denied environments.Regarding the first part of the dissertation, two datasets were designed and created for the training and evaluation of keypoint detectors and descriptors. The training dataset includes 48 000 of FPV (First-Person-View) images with wide range of variations in landscapes, including images from Earth, Moon and Mars while the evaluation dataset includes about 120 sequences of planetary-(like) scenes where each sequence contains the original image and five different generated representations of the same scene, in terms of illumination and viewpoint.In the second part of this dissertation, a self-supervised neural network architecture called SuperPoint was implemented and modified, investigating its efficiency in keypoint detection and description applied in unstructured and planetary scenes. Three different SuperPoint models were produced: (a) an original SuperPoint model trained from scratch, (b) an original fine-tuned SuperPoint model, (c) an optimized SuperPoint model trained from scratch. The experimentation proved that the optimized SuperPoint model provides superior performance, compared with the original SuperPoint models and handcrafted keypoint detectors and descriptors.Concerning the third part of the dissertation, a multi-task deep learning architecture is developed for keypoint detection and description, focused on poor-featured unstructured and planetary scenes with low or changing illumination while the training and evaluation processes were conducted using the proposed datasets. Moreover, the trained model was integrated in a visual SLAM (Simultaneous Localization and Maping) system as a feature extraction module, and tested in two feature-poor unstructured areas. Regarding the results, the proposed architecture provides increased accuracy in terms of keypoint description, outperforming well-known handcrafted algorithms while the proposed SLAM achieved superior results in areas with medium and low illumination compared with the ORB-SLAM2 algorithm.In the fourth part of the dissertation, a lightweight encoder-decoder neural network (NN) architecture is proposed for rover-based ground segmentation on the lunar surface. The proposed architecture is composed by a modified MobilenetV2 as encoder and a lightweight U-net decoder while the training and evaluation process were conducted using a publicly available synthetic dataset with lunar landscape images. The proposed model provides robust segmentation results, achieving similar accuracy with the original U-net and U-net-based architectures which are 110 - 140 times larger than the proposed architecture. This study, aims to contribute in lunar ground segmentation utilizing deep learning techniques, while it proves a significant potential in autonomous lunar navigation ensuring a safer and smoother navigation on the moon.Regarding the fifth part of the dissertation, a precise positioning alternative was developed aiming to localize fiducial markers and characteristic points of the scene, providing their local coordinates in 3D space under a high level of accuracy. At first, the fiducial markers are placed in the scene where one of them is used as the origin marker, while the target markers represent the characteristic points or features. Subsequently, the proposed SLAM algorithm enables an RGB-Depth camera to map the desired area and localize itself in an unknown and challenging environment, while in combination with geometrical transformations, localization and optimization techniques, the present methodology estimates the coordinates of target markers and an arbitrary point cloud which approximates the structure of the environment.It is clear that the use of deep learning in unstructured and planetary environments in terms of scene recognition, localization and mapping provides a significant potential for the future applications, reinforcing crucial topics such as autonomous navigation in hazardous and unknown environments. This dissertation aspires to encourage the investigation and development of AI models and datasets, focused on planetary exploration missions and especially on high and low-level scene understanding using computationally efficient equipment and methods, reducing the economic and energy costs of robotic systems.

Search

Browse

My Space

Visual localization in unstructured environments through deep learning

Petrakis Georgios

Summary

Available Files

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: