Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Exploring efficiency and performance of image captioning

Frechat Nantia-Efthymia

Full record


URI: http://purl.tuc.gr/dl/dias/8C58FB2B-27F0-4FDF-B00F-15346E1477C1
Year 2019
Type of Item Diploma Work
License
Details
Bibliographic Citation Nantia-Efthymia Frechat, "Exploring efficiency and performance of image captioning", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2019 https://doi.org/10.26233/heallink.tuc.83634
Appears in Collections

Summary

Image captioning is a complex problem that combines the fields of computer vision and natural language processing. It generates natural language sentences that describe the content of an image. Image captioning has several applications in the real world with significant practical impact, from assisting users with visual impairments to personal assistants through human-robot interaction.The progress in image captioning is a major success of Artificial Intelligence. It has been reported that under some metrics, such as BLUE or CIDEr, the most up-to-date techniques even outperform human performance.In this thesis, we implement and present a model based on machine learning techniques that combines the latest developments in computer vision and machine translation that can be used to create natural sentences that describe an image. Specifically, a combination of Convolutional Neural Networks together with Recurrent Neural Networks was used to obtain the desired results. The models were trained to maximize the likelihood of a target description given the training image.Experiments on a huge set of training data, such as the MSCOCO 2015 used in this thesis, demonstrate the accuracy of the model and the fluency of the language that is acquired through the image descriptions alone. It has been tested qualitatively and quantitatively that the model is often quite accurate.

Available Files

Services

Statistics