Abstractive text summarisation using recurrent neural networks at the paragraph level

Tchouya’a Ngoko, Israel Christian

Abstractive text summarisation using recurrent neural networks at the paragraph level

Author(s)

Tchouya’a Ngoko, Israel Christian

Date Issued

2020

Type

Thesis

Publisher

Cape Peninsula University of Technology

Abstract

In this new century, the huge amount of data produced daily will remain useless unless we use emerging tools and technologies to make it accessible. There is a need of content summarisers, to reduce manual summarisation which is time consuming and incurs massive costs.
Over the recent years sequence-to-sequence learning has attracted more interest. Text summarisation in natural language processing has been limited to extractive methods that select the important sentences of the original text and combine them to form the final summary. The success of end-to-end training of encoder-decoder neural networks in machine translation tasks has developed research using the same architectures in tasks such as paraphrase generation or abstractive text summarisation.
Abstractive text summarisation attempts to get the main content of a text and compresses it while keeping its meaning, its semantic and grammatical correctness. It generates dynamic paraphrases and produces natural summaries. It has been recently less attempted and understood. These sequence-to-sequence models founded on Recurrent Neural Networks (RNN) were able to link the input and output data in an encoder-decoder architecture. Further producing good output summaries with the inclusion of attention mechanisms to the RNN layers. Research has shown the good performance of these architectures by using attention mechanisms in machine translation. Abstractive text summarisation using recurrent neural networks with attention mechanisms at sentences has produced better results. It has excelled the recent state-of-the-art model of abstractive text summarisation. However, for longer document summaries, these models often contain grammatical errors. In this investigation we employ a data-controlled approach using recurrent neural networks at paragraph level and train the model end-to-end, to predict the summary for a given text document. We evaluate this model to the DUC 2004 datasets. Our model produces higher quality summaries and obtains 44.44 ROUGE-1 score, 22.50 ROUGE-2 score and 45.15 ROUGE-L score on DUC 2004 datasets.

Additional information

Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2020

Subjects

Computational linguis...

Semantic computing

Automatic abstracting...

Neural networks (Comp...

File(s)

Name

Tchouya'a_Ngoko_Israel_212014269.pdf

Size

1.71 MB

Format

Adobe PDF

Checksum

(MD5):fcb7cba75a12fd7f2107c423d6bdfbe6