Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Multi-document text summarization

Kritharakis Emmanouil

Full record

Year 2019
Type of Item Diploma Work
Bibliographic Citation Emmanouil Kritharakis, "Multi-document text summarization", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2019
Appears in Collections


In the past few years, automatic text summarization has witnessed increasing interest, since it can aid many applications by condensing the large quantities of information available into short, concise summaries. In this direction, text summarization with sequence-to-sequence (seq2seq) models has attracted the interest of the research community. Similar encoder-decoder architectures have also been exploited on multi-document text summarization. However, the adaptation of the seq2seq models to the multi-document summarization task is not always successful and requires advanced attention mechanisms to avoid unnecessary repetitions. In this thesis, we propose a novel attention mechanism, which is based on sentence similarity, to improve the multi-document text summarization process. With the proposed attention mechanism, the text summarizer takes into account the semantic and syntactic nature of the sentences, which is particularly useful in a multi-document dataset. Τo investigate the effectiveness of the sentence similarity algorithm, two families of experiments were conducted. In the first, we compared the proposed algorithm to a similar, recently published, sentence similarity method. Using the Pearson correlation coefficient and other statistical metrics, we prove that our algorithm is able to obtain significantly improved performance. In the second family of experiments, we integrated the sentence similarity algorithm as an attention mechanism into the text summarizer. The evaluation of the performance under several automated metrics shows that the proposed methodology outperforms other state-of-the-art text summarization techniques on the multi-document newswire topics from the DUC-2004 and TAC-2011 datasets.

Available Files