Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Semantic similarity computation and word sense induction using hidden sets multidimensional scaling

Athanasopoulou Georgia

Full record


URI: http://purl.tuc.gr/dl/dias/17E8583A-E9FD-47A3-9217-D6D3CE6A642D
Year 2016
Type of Item Master Thesis
License
Details
Bibliographic Citation Georgia Athanasopoulou, "Semantic similarity computation and word sense induction using hidden sets multidimensional scaling", Master Thesis, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2016 https://doi.org/10.26233/heallink.tuc.66075
Appears in Collections

Summary

In this thesis, motivated by evidences in psycholinguistics and cognition, we propose an unsupervised language-agnostic Distributional Semantic Model (DSM), that utilize web harvested data, for the problem of semantic similarity estimation. Semantic similarity can be applied to numerous tasks of Natural Language Processing (NLP), such as affective text analysis and paraphrasing.In the first part of the thesis, the construction of typical DSMs following the well-established Vector Space Model, is presented. More specifically, we describe the creation of corpora by harvesting web documents following a query-based approach, as well as state-of-the-art DSMs used for the computation of semantic similarity from the corpora. Next, we propose a novel hierarchical distributed semantic model (DSM), that is inspired by evidence in psycholinguistics and cognition, and consists of low-dimensional manifolds built on semantic neighborhoods. Each manifold is sparsely encoded and mapped into a low-dimensional space. Global operations are decomposed into local operations in multiple sub-spaces; results from these local operations are fused to come up with semantic relatedness estimates. Manifold DSM are constructed starting from a pairwise word-level semantic similarity matrix. The proposed model is evaluated against state-of-the-art/baseline DSMs on semantic similarity estimation task, where the similarity metrics are evaluated against human similarity ratings. The proposed model significantly improve performance comparing to the baseline approaches for the task of semantic similarity estimation between words. Furthermore the proposed model is evaluated in a taxonomy task achieving achieving state-of-the-art results. Finally, motivated by evidence of cognitive organization of concepts based on the degree of concreteness, we present the performance of proposed DSM for abstract and concrete nouns.

Available Files

Services

Statistics