Dissertation: Spoiler Alert: Deep Learning with NLP to identify spoilers

Abstract

Social media has become an endless sea of information; some you want to see and others you don’t. This project uses deep learning on a dataset of IMDb movie reviews to train neural network models to detect spoilers in text. A spoiler is information that will ruin a viewer’s sense of surprise during a motion picture; this paper focuses on movie spoilers in particular, hence the use of the IMDb dataset. While other research into this area does exist, they mainly use book review data, which does not generalize well to movies. The three types of models developed for this task are presented: bag-of-words, recurrent neural networks, and transformers. Final models are evaluated using accuracy, recall, precision, and F1 scores. The best results are achieved by the transformer model, which uses DistilBERT. When compared to previous research, this model achieves a higher F1 score.

Access Project

Read the final report here:

The models built can be found on the GitHub repository for this project:

GitHub - ysmnpksy/Final-Project