<efrbr:recordSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:efrbr="http://vfrbr.info/efrbr/1.1" xmlns:efrbr-work="http://vfrbr.info/efrbr/1.1/work" xmlns:efrbr-expression="http://vfrbr.info/efrbr/1.1/expression" xmlns:efrbr-manifestation="http://vfrbr.info/efrbr/1.1/manifestation" xmlns:efrbr-person="http://vfrbr.info/efrbr/1.1/person" xmlns:efrbr-corporateBody="http://vfrbr.info/efrbr/1.1/corporateBody" xmlns:efrbr-concept="http://vfrbr.info/efrbr/1.1/concept" xmlns:efrbr-structure="http://vfrbr.info/efrbr/1.1/structure" xmlns:efrbr-responsible="http://vfrbr.info/efrbr/1.1/responsible" xmlns:efrbr-subject="http://vfrbr.info/efrbr/1.1/subject" xmlns:efrbr-other="http://vfrbr.info/efrbr/1.1/other" xsi:schemaLocation="http://vfrbr.info/efrbr/1.1 http://vfrbr.info/schemas/1.1/efrbr.xsd"><efrbr:entities><efrbr-work:work identifier="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D"><efrbr-work:titleOfTheWork>Accelerating deep reinforcement learning via imitation</efrbr-work:titleOfTheWork></efrbr-work:work><efrbr-expression:expression identifier="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D"><efrbr-expression:titleOfTheExpression>Accelerating deep reinforcement learning via imitation</efrbr-expression:titleOfTheExpression><efrbr-expression:titleOfTheExpression>Επιτάχυνση διαδικασιών βαθιάς ενισχυτικής μάθησης μέσω μίμησης</efrbr-expression:titleOfTheExpression><efrbr-expression:formOfExpression vocabulary="DIAS:TYPES">
            Διπλωματική Εργασία
            Diploma Work
         </efrbr-expression:formOfExpression><efrbr-expression:dateOfExpression type="issued">2020-02-24</efrbr-expression:dateOfExpression><efrbr-expression:dateOfExpression type="published">2020</efrbr-expression:dateOfExpression><efrbr-expression:languageOfExpression vocabulary="iso639-1">en</efrbr-expression:languageOfExpression><efrbr-expression:summarizationOfContent>Imitation has evolved in nature as an advanced behavioural tool for knowledge transfer between individuals. It can be observed in most of the higher intelligence life forms such as members of the simian (monkeys &amp; apes), delphinidae (dolphins) and corvus (crows, ravens, jackdows) groups. Its advantages over instinctual acting and habituation can be seen in the vast success of the animals capable of imitation learning, throughout the world's ecosystems.
In machine learning mimicry and imitation have been implemented in the form of supervised learning, and have been used in reinforcement learning through explicit imitation techniques. Implicit imitation has been also tested as an alternative to direct knowledge transfer in single and multiagent systems in order to accelerate individual agent learning rates by use of extracted experiences from previous sessions or other cooperative agents. Even though these techniques have achieved promising results, they have not to date taken advantage of the recent success of neural networks and deep learning.
In this thesis, we propose the application of implicit imitation on model-free, deep reinforcement learning techniques in order to speed up the learning stages of the respective agents. Briefly, by extracting experience from a mentor agent and augmenting the Bellman backups of another agent to benefit from this experience, we manage to provide a way of guidance. The observer decides whether to trust or disregard that information based on a confidence testing mechanism. We test our model on a DQN variant in classic control environments and demonstrate accelerated learning via our experiments. Though we limit our tests to one deep learning algorithm and simple settings, we comment on extensions of our model to other agents and more complex environments.</efrbr-expression:summarizationOfContent><efrbr-expression:summarizationOfContent>Η μίμηση έχει εξελιχθεί στο φυσικό περιβάλλον ως προηγμένο συμπεριφορικό εργαλείο για τη μεταφορά γνώσης μεταξύ οργανισμών. Μπορεί να παρατηρηθεί στις περισσότερες μορφές ζωής με υψηλότερα επίπεδα νοημοσύνης όπως στα μέλη των ομάδων simian (πρωτεύοντα, πίθηκοι), delphinidae (δελφίνια) και corvus (κοράκια, κίσσες). Τα πλεονεκτήματά της έναντι της ενστικτώδους δράσης μπορούν να παρατηρηθούν στην ξεκάθαρη επιτυχία των οργανισμών ικανών για μιμητική μάθηση σε όλα τα οικοσυστήματα του κόσμου.
Στη μηχανική μάθηση, η μίμηση έχει υλοποιηθεί με τη μορφή εποπτευόμενης μάθησης και έχει χρησιμοποιηθεί στην ενισχυτική μάθηση μέσω τεχνικών απευθείας μίμησης. Επιπρόσθετα, η έμμεση μίμηση έχει δοκιμαστεί ως εναλλακτική μέθοδος άμεσης μεταφοράς γνώσης τόσο σε μονοπρακτορικά όσο και σε πολυπρακτορικά συστήματα, με σκοπό την επιτάχυνση του ρυθμού εκπαίδευσης των πρακτόρων μέσω της χρήσης εμπειριών από προηγούμενες συνεδρίες ή άλλους πράκτορες. Παρόλο που οι τεχνικές αυτές έχουν παρουσιάσει πληθώρα υποσχόμενων αποτελεσμάτων, μέχρι σήμερα δεν έχουν επωφεληθεί από την πρόσφατη επιτυχία των τεχνητών νευρωνικών δικτύων και της βαθιάς μάθησης.
Σε αυτή τη διπλωματική εργασία, προτείνουμε την εφαρμογή εμμέσου μίμησης σε τεχνικές βαθιάς ενισχυτικής μάθησης χωρίς μοντέλο προκειμένου να επιταχυνθούν τα στάδια εκπαίδευσης των αντίστοιχων πρακτόρων. Εν συντομία, εξάγοντας την υπάρχουσα εμπειρία από έναν πράκτορα- μέντορα και μεταλλάσσοντας τις εξισώσεις Bellman ενός άλλου πράκτορα-παρατηρητή ώστε να μπορεί να επωφεληθεί από αυτή, καταφέρνουμε να προσφέρουμε έναν τρόπο καθοδήγησης της εκπαίδευσης. Ο πράκτορας-παρατηρητής αποφασίζει εάν θα εμπιστευτεί ή θα αγνοήσει τις πληροφορίες αυτές βάσει ενός μηχανισμού ελέγχου εμπιστοσύνης. Μέσω των πειραμάτων μας δοκιμάζουμε το μοντέλο μας σε μία παραλλαγή του αλγορίθμου DQN σε κλασικά περιβάλλοντα ελέγχου και παρουσιάζουμε επιταχυμένα επίπεδα μάθησης. Αν και περιορίζουμε τις δοκιμές μας σε έναν μόνο αλγόριθμο βαθιάς μάθησης και σε απλά περιβάλλοντα, σχολιάζουμε τις επεκτάσεις του μοντέλου μας σε άλλους πράκτορες και πιο πολύπλοκα περιβάλλοντα.</efrbr-expression:summarizationOfContent><efrbr-expression:contextForTheExpression>Διπλωματική εργασία</efrbr-expression:contextForTheExpression><efrbr-expression:useRestrictionsOnTheExpression type="creative-commons">http://creativecommons.org/licenses/by-sa/4.0/</efrbr-expression:useRestrictionsOnTheExpression><efrbr-expression:note type="academic unit">Πολυτεχνείο Κρήτης::Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών</efrbr-expression:note></efrbr-expression:expression><efrbr-manifestation:manifestation identifier="https://dias.library.tuc.gr/view/84658"><efrbr-manifestation:titleOfTheManifestation>Papathanasiou_Theodoros_Dip_2020.pdf</efrbr-manifestation:titleOfTheManifestation><efrbr-manifestation:publicationDistribution><efrbr-manifestation:placeOfPublicationDistribution type="distribution">Chania [Greece]</efrbr-manifestation:placeOfPublicationDistribution><efrbr-manifestation:publisherDistributor type="distributor">Library of TUC</efrbr-manifestation:publisherDistributor><efrbr-manifestation:dateOfPublicationDistribution>2020-02-24</efrbr-manifestation:dateOfPublicationDistribution></efrbr-manifestation:publicationDistribution><efrbr-manifestation:formOfCarrier>application/pdf</efrbr-manifestation:formOfCarrier><efrbr-manifestation:extentOfTheCarrier>1 MB</efrbr-manifestation:extentOfTheCarrier><efrbr-manifestation:accessRestrictionsOnTheManifestation>campus</efrbr-manifestation:accessRestrictionsOnTheManifestation></efrbr-manifestation:manifestation><efrbr-person:person identifier="http://users.isc.tuc.gr/~tpapathanasiou"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Papathanasiou Theodoros
            Παπαθανασιου Θεοδωρος
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="http://users.isc.tuc.gr/~gchalkiadakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Chalkiadakis Georgios
            Χαλκιαδακης Γεωργιος
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="http://users.isc.tuc.gr/~abletsas"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Bletsas Aggelos
            Μπλετσας Αγγελος
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="http://users.isc.tuc.gr/~vsamoladas"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Samoladas Vasilis
            Σαμολαδας Βασιλης
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-corporateBody:corporateBody identifier="39699227-FC50-49BA-9078-49F4B231483C"><efrbr-corporateBody:nameOfTheCorporateBody vocabulary="">
            Πολυτεχνείο Κρήτης
            Technical University of Crete
         </efrbr-corporateBody:nameOfTheCorporateBody></efrbr-corporateBody:corporateBody><efrbr-concept:concept identifier="C0902472-732A-4E4F-91FE-C629C162A6E8"><efrbr-concept:termForTheConcept>
            Deep reinforcement learning
         </efrbr-concept:termForTheConcept></efrbr-concept:concept><efrbr-concept:concept identifier="0ADC4AA0-5349-4828-A88F-BC7787B980B8"><efrbr-concept:termForTheConcept>
            Imitation learning
         </efrbr-concept:termForTheConcept></efrbr-concept:concept><efrbr-concept:concept identifier="8165B2F5-6ADF-481C-9D32-CA8A20E10902"><efrbr-concept:termForTheConcept>
            Artificial intelligence
         </efrbr-concept:termForTheConcept></efrbr-concept:concept></efrbr:entities><efrbr:relationships><efrbr-structure:structureRelations><efrbr-structure:realizedThrough sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="expression" targetURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D"/><efrbr-structure:embodiedIn sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="manifestation" targetURI="http://purl.tuc.gr/dl/dias/6101022F-CBCB-4FE4-8644-87B5563758D4"/></efrbr-structure:structureRelations><efrbr-responsible:responsibleRelations><efrbr-responsible:createdBy sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="person" targetURI="http://users.isc.tuc.gr/~tpapathanasiou"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="person" targetURI="http://users.isc.tuc.gr/~tpapathanasiou" role="author"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="person" targetURI="http://users.isc.tuc.gr/~gchalkiadakis" role="http://purl.tuc.gr/dl/dias/vocabs/contributor-roles/1"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="person" targetURI="http://users.isc.tuc.gr/~abletsas" role="http://purl.tuc.gr/dl/dias/vocabs/contributor-roles/2"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="person" targetURI="http://users.isc.tuc.gr/~vsamoladas" role="http://purl.tuc.gr/dl/dias/vocabs/contributor-roles/2"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="person" targetURI="39699227-FC50-49BA-9078-49F4B231483C" role="publisher"/></efrbr-responsible:responsibleRelations><efrbr-subject:subjectRelations><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="concept" targetURI="C0902472-732A-4E4F-91FE-C629C162A6E8"/><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="concept" targetURI="0ADC4AA0-5349-4828-A88F-BC7787B980B8"/><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/668C47FF-2EC6-4DDA-BD5A-9CF6B3A7BF5D" targetEntity="concept" targetURI="8165B2F5-6ADF-481C-9D32-CA8A20E10902"/></efrbr-subject:subjectRelations><efrbr-other:otherRelations/></efrbr:relationships></efrbr:recordSet>