Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

FACESiR: face and speaker identity recognition in video streams

Karageorgiadis Anastasios

Full record


URI: http://purl.tuc.gr/dl/dias/8CF4DC95-DFB4-4E5F-8429-C5510F81BE56
Year 2019
Type of Item Diploma Work
License
Details
Bibliographic Citation Anastasios Karageorgiadis, "FACESiR: face and speaker identity recognition in video streams", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2019 https://doi.org/10.26233/heallink.tuc.84791
Appears in Collections

Summary

Person indexing in video streams requires first to recognize a person’s identity and secondly finding the time slot in which a person appears. In this diploma thesis, we develop a method for identifying exposed speakers within a video stream using machine learning techniques. More specifically, with the help of Neural Networks, after we exploit the structure of a video as a sequence of images and sounds, we use these data for the identification of a speaker at each video frame. The above problem is divided into two sub-problems, Face Recognition and Speaker Recognition, where we use a top-down design to split them into smaller ones. Each sub-problem is solved individually, but the combination of their outputprobabilities per class leads to an improved final decision regarding classification. The method has been implemented in the Python programming language using the Tensorflow framework and the Keras API.The suggested approach is based on Convolutional Neural Network architectures for both Face and Speaker Recognition. As a result, the combination of image and sound leads to a better decision for the identity of a person who appears in a specific time slot of the video. In addition, the main advantage of the proposed method is that it can be utilized for many different use cases, such as search for missing persons, recognition of celebrities, or even promotion of public figures. It is also worth mentioning that with some minor changes it can be used for identifying any other entity in a video stream.

Available Files

Services

Statistics